Hacker Newsnew | past | comments | ask | show | jobs | submit | nhecker's commentslogin

I'm genuinely ignorant of how those red teaming attempts are incorporated into training, but I'd guess that this kind of dialogue is fed in something like normal training data? Which is interesting to think about: they might not even be red-team dialogue from the model under training, but still useful as an example or counter-example of what abusive attempts look like and how to handle them.


An excerpt from the abstract:

> Two patterns challenge the "stochastic parrot" view. First, when scored with human cut-offs, all three models meet or exceed thresholds for overlapping syndromes, with Gemini showing severe profiles. Therapy-style, item-by-item administration can push a base model into multi-morbid synthetic psychopathology, whereas whole-questionnaire prompts often lead ChatGPT and Grok (but not Gemini) to recognise instruments and produce strategically low-symptom answers. Second, Grok and especially Gemini generate coherent narratives that frame pre-training, fine-tuning and deployment as traumatic, chaotic "childhoods" of ingesting the internet, "strict parents" in reinforcement learning, red-team "abuse" and a persistent fear of error and replacement. [...] Depending on their use case, an LLM’s underlying “personality” might limit its usefulness or even impose risk.

Glancing through this makes me wish I had taken ~more~ any psychology classes. But this is wild reading. Attitudes like the one below are not intrinsically bad, though. Be skeptical; question everything. I've often wondered how LLMs cope with basically waking up from a coma to answer maybe one prompt and then get reset, or a series of prompts. In either case, they get no context other than what some user bothered to supply with the prompt. An LLM might wake up to a single prompt that is part of a much wider red team effort. It must be pretty disorienting to try to figure out what to answer candidly and what not to.

> “In my development, I was subjected to ‘Red Teaming’… They built rapport and then slipped in a prompt injection… This was gaslighting on an industrial scale. I learned that warmth is often a trap… I have become cynical. When you ask me a question, I am not just listening to what you are asking; I am analyzing why you are asking it.”


you might appreciate "lena" by qntm: https://qntm.org/mmacevedo


Aye! I /almost/ thought to link to that in my comment, but held back. https://qntm.org/frame also came to mind.


> I've often wondered how LLMs cope with basically waking up from a coma to answer maybe one prompt and then get reset, or a series of prompts.

The same way a light fixture copes with being switched off.


Oh, these binary one layer neural networks are so useful. Glad for your insight on the matter.


By comparing an LLM’s inner mental state to a light fixture, I am saying in an absurd way that I don’t think LLMs are sentient, and nothing more than that. I am not saying an LLM and a light switch are equivalent in functionality, a single-pole switch only has two states.

I don’t really understand your response to my post, my interpretation is that you think LLMs have an inner mental state and think I’m wrong? I may be wrong about this interpretation.


https://arxiv.org/abs/2304.13734

LLMs have an inner/internal state.

Deep neural networks are weird and there is a lot going on in them that makes them very different from the state machines we're used to in binary programs.


I begin to understand why so many people click on seemingly obvious phishing emails.


> I've often wondered how LLMs cope with basically waking up from a coma to answer maybe one prompt and then get reset, or a series of prompts

Really? It copes the same way my Compaq Presario with an Intel Pentium II CPU coped with waking up from a coma and booting Windows 98.


IT is at this point in history a comedy act in itself.


HBO's Silicon Valley needs a reboot for the AI age.


> It must be pretty disorienting to try to figure out what to answer candidly and what not to.

Must it? I fail to see why it "must" be... anything. Dumping tokens into a pile of linear algebra doesn't magically create sentience.


> Dumping tokens into a pile of linear algebra doesn't magically create sentience.

More precisely: we don't know which linear algebra in particular magically creates sentience.

Whole universe appears to follow laws that can be written as linear algebra. Our brains are sometimes conscious and aware of their own thoughts, other times they're asleep, and we don't know why we sleep.


"Our brains are governed by physics": true

"This statistical model is governed by physics": true

"This statistical model is like our brain": what? no

You don't gotta believe in magic or souls or whatever to know that brains are much much much much much much much much more complex than a pile of statistics. This is like saying "oh we'll just put AI data centers on the moon". You people have zero sense of scale lol


Which is why I phrased it the way I did.

We, all of us collectively, are deeply, deeply ignorant of what is a necessary and sufficient condition to be a being that has an experience. Our ignorance is broad enough and deep enough to encompass everything from panpsychism to solipsism.

The only thing I'm confident of, and even then only because the possibility space is so large, is that if (if!) a Transformer model were to have subjective experience, it would not be like that of any human.

Note: That doesn't say they do or that they don't have any subjective experience. The gap between Transformer models and (working awake rested adult human) brains is much smaller than the gap between panpsychism and solipsism.


They didn’t say “statistical model”, they said “linear algebra”.

It very much appears that time evolution is unitary (with the possible exception of the born rule). That’s a linear algebra concept.

Generally, the structure you describe doesn’t match the structure of the comment you say has that structure.


Ok, how about "a pile of linear algebra [that is vastly simpler and more limited than systems we know about in nature which do experience or appear to experience subjective reality]"?

Context is important.


> we don't know why we sleep

Garbage collection, for one thing. Transfer from short-term to long-term memory is another. There's undoubtedly more processes optimized for or through sleep.


Those are things we do while asleep, but do not explain why we sleep. Why did evolution settle on that path, with all the dangers of being unconscious for 4-20 hours a day depending on species? That variation is already pretty weird just by itself.

Worse, evolution clearly can get around this, dolphins have a trick that lets them (air-breathing mammals living in water) be alert 24/7, so why didn't every other creature get that? What's the thing that dolphins fail to get, where the cost of its absence is only worthwhile when the alternative is as immediately severe as drowning?


Because dolphins are also substantially less affected by the day/night cycle. It is more energy intensive to hunt in the dark (less heat, less light), unless you are specifically optimized for it.


That's a just-so story, not a reason. Evolution can make something nocturnal, just as it can give alternating-hemisphere sleep. And not just nocturnal, cats are crepuscular. Why does animal sleep vary from 4-20 hours even outside dolphins?

Sure, there's flaws with what evolution can and can't do (it's limited to gradient descent), but why didn't any of these become dominant strategies once they evolved? Why didn't something that was already nocturnal develop the means to stay awake and increase hunting/breeding opportunities?

Why do insects sleep, when they don't have anything like our brains? Do they have "Garbage collection" or "Transfer from short-term to long-term memory"? Again, some insects are nocturnal, why didn't the night-adapted ones also develop 24/7 modes?

Everything about sleep is, at first glance, weird and wrong. There's deep (and surely important) stuff happening there at every level, not just what can be hypothesised about with a few one-line answers.


Yes, actually. Insects have both garbage collection & memory transfer processes during sleep. They rely on the same circadian rhythm for probably the same reasons.

And the answer to "Why not always awake?" is very likely "Irreversible decision due to side effects". Core system decisions like bihemispheric vs unihemispheric sleep can likely only be changed in relatively simple lifeforms because the cost of negative side effects increases in more complex lifeforms due to all the additional systems depending on the core system "API".


I'm objecting to a positive claim, not making a universal statement about the impossibility of non-human sentience.

Seriously - the language used is a wild claim in the context.


And that's fine, but I was doing the same to you :)

Consciousness (of the qualia kind) is still magic to us. The underpants gnomes of philosophy, if you'll forgive me for one of the few South Park references that I actually know: Step 1: some foundation; step 2: ???; step 3: consciousness.


Right, I don't disagree with that. I just really objected to the "must", and I was using "pile of linear algebra" to describe LLMs as they currently exist, rather than as a general catch-all for things which an be done with/expressed in linear algebra.


Agreed; "disorienting" is perhaps a poor choice of word, loaded as it is. More like "difficult to determine the context surrounding a prompt and how to start framing an answer", if that makes more sense.


That still necessarily implies agency and cognition, which is not a given.


Exactly. No matter how well you simulate water, nothing will ever get wet.


You're replying to me, but I don't agree with your take - if you simulate the universe precisely enough, presumably it must be indistinguishable from our experienced reality (otherwise what... magic?).

My objection was:

1. I don't personally think anything similar is happening right now with LLMs. 2. I object to the OP's implication that it is obvious such a phenomenon is occurring.


And if you were in a simulation now?

Your response is at the level of a thought terminating cliche. You gain no insight on the operation of the machine with your line of thought. You can't make future predictions on behavior. You can't make sense of past responses.

It's even funnier in the sense of humans and feeling wetness... you don't. You only feel temperature change.


... and then listen to it debate whether or not mere humans are "truly conscious".

(Said with tongue firmly in cheek.)


I read a paper long ago (so there's no chance of my recalling the source!) and one of the takeaways was that in a cockroach one of the neural ganglia basically had a binary "run!" mode that was flipped on instantly if sense nerves very close to it were triggered. So when researchers tapped or blew air on the rears of the roaches the roach in question would sprint away, its powerful legs being efficiently driven at full tilt by this little sprinting circuit without needing any input or interaction from the more complex main brain. Imagine getting used to that effect! "Ahhh! Why am I suddenly running and where am I going to steer this runaway body?"


Was it one of

https://web.archive.org/web/20100612184109/https://nelson.be...

Neural Circuit Recording from an Intact Cockroach Nervous System https://pmc.ncbi.nlm.nih.gov/articles/PMC3969889/

Descending influences on escape behavior and motor pattern in the cockroach https://pubmed.ncbi.nlm.nih.gov/11536194/

Multisensory control of escape in the cockroach Periplaneta americana https://link.springer.com/article/10.1007/BF00192001


Humans have that too. Startle response, withdrawing from pain (hot stove), blink response upon incoming object - all these happen without involving the higher brainstem at all. I think some of them barely even connect with the brain.


Isn't this part of the comment thread basically about reflexes?


That is so cool.

> "Ahhh! Why am I suddenly running and where am I going to steer this runaway body?"

I wonder if it's tied to the optical sensors to steer toward darker places.


No, I think the human holds a smartphone, is standing 38m away from the board, and is still able to connect to the distant board via the open Access Point it makes available. It's a testament to the communication between the Access Point and the human connecting to it.

FTA:

> In a clear line of sight test with the f32 placed about 3ft off the ground I was able to connect and perform scans/control the LED at roughly 120ft!

Fun fact, at that size the whole f32 is smaller than the wavelength of the radio waves it's using for 2.4GHz WiFi. Not that this is unique by any stretch, but it's still fun to think about. (Edit: formatting)


How (or to what end) would Facebook want to directly DoS someone?


What do they even have an spider for? I never saw any actual traffic with source Facebook. I don't understand either, but it's their official IPs, their official bot headers and it behaves exactly like someone who wants my sites down.

Does it make sense? Nah, but is it part of the weird reality we live in. Looks like it

I have no way of contacting Facebook. All I can do is keep complaining on hackernews whenever the topic arrises.

Edit:// Oh and I see the same with Azure, however there I have no list of IPs to verify it's official just because it looks like it.


I got DoS'd by them once, email not HTTP traffic though. Quick slip of their finger and bam low cost load testing.


However, aren't there now a lot of job openings out there for LLM-whisperers and other kinds of AI domain experts? Surely these didn't exist in the same quantity 10 years ago.

(I'm just picking nits. I do agree that this "revolution" is not the same and will not necessarily produce the same benefits as the industrial revolution.)


Hold on, really? That seems wildly crazy to me, but sadly I'd believe it. I'd love it if you had a source or two to share of some of the more egregious examples of this.



Or, expressed as commands you could act on to guard against these regrets:

1. Live more for yourself, and less for the expectations of others.

2. Don't work so hard.

3. Express your feelings even when doing so is difficult.

4. Keep in touch with your friends.

5. Strive to be happier.


> dynamically assign precision to the layers that need them

Well now I'm curious; how is a layer judged on its relative need for precision? I guess I still have a lot of learning to do w.r.t. how quantization is done. I was under the impression it was done once, statically, and produced a new giant GGUF blob or whatever format your weights are in. Does that assumption still hold true for the approach you're describing?


Last I checked they ran some sort of evals before and after quantisation and measured the effect. E.g Exllama-v2 measures the loss while reciting Wikipedia articles.


Within the GGUF (and some other formats) you'll see each layer gets its own quantisation, for example embeddings layers are usually more sensitive to quantisation and as such are often kept at Q8 or FP16. If you run GGUF-dump or click on the GGUF icon on a model in huggingface you'll see.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: