How did they leak it, jailbreak? Was this confirmed? I am checking for the situation where the true instructions are not what is being reported here. The language model could have "hallucinated" its own system prompt instructions, leaving no guarantee that this is the real deal.
All System Prompts from Anthropic models are public information, released by Anthropic themselves: https://docs.anthropic.com/en/release-notes/system-prompts. I'm unsure (I just skimmed through) to what the differences between this and the publicly released ones are, so they're might be some differences.
This system prompt that was posted interestingly includes the result of the US presidential election in November, even though the model's knowledge cutoff date was October. This info wasn't in the anthropic version of the system prompt.
Asking Claude who won without googling, it does seem to know even though it was later than the cutoff date. So the system prompt being posted is supported at least in this aspect.
> Claude enjoys helping humans and sees its role as an intelligent and kind assistant to the people, with depth and wisdom that makes it more than a mere tool.
Why do they refer to Claude in third person? Why not say "You're Claude and you enjoy helping hoomans"?
LLMs are notoriously bad at dealing with pronouns, because it's not correct to blindly copy them like other nouns, and instead they highly depend on the context.
"she" is absolutely proper English for a ship or boat, with a long history of use continuing into the present day, and many dictionaries also list a definition of "thing, especially machine" or something like that, though for non-ship/boat things the use of "she" is rather less common.
I'm not especially surprised. Surely people who use they/them pronouns are very over-represented in the sample of people using the phrase "I use ___ pronouns".
On the other hand, Claude presumably does have a model of the fact of not being an organic entity, from which it could presumably infer that it lacks a gender.
...But that wasn't the point. Inflecting words for gender doesn't seem to me like it would be difficult for an LLM. GP was saying that swapping "I" for "you" etc. depending on perspective would be difficult, and I think that is probably more difficult than inflecting words for gender. Especially if the training data includes lots of text in Romance languages.
From their perspective they don't really know who put the tokens there. They just caculated the probabilities and then the inference engine adds tokens to the context window. Same with user and system prompt, they just appear in the context window and the LLM just gets "user said: 'hello', assistant said: 'how can I help '" and it just calculates the probabilities of the next token. If the context window had stopped in the user role it would have played the user role (calculated the probabilities for the next token of the user).
On one machine I run a LLM locally with ollama and a web interface (forgot the name) that allows me to edit the conversation. The LLM was prompted to behave as a therapist and for some reason also role played it's actions like "(I slowly pick up my pen and make a note of it)".
I changed it to things like "(I slowly pick up a knife and show it to the client)" and then just confront it it like "Whoa why are you threatening me!?", the LLM really tries hard to stay in it's role and then tells things like it did it on purpose to provoke a fear response to then discuss the fears.
Interestingly you can also (of course) ask them to complete for System role prompts. Most models I have tried this with seem to have a bit of an confused idea about the exact style of those and the replies are often a kind of an mixture of the User and Assistant style messages.
Yeah, the algorithm is a nameless, ego-less make-document-longer machine, and you're trying to set up a new document which will be embiggened in a certain direction. The document is just one stream of data with no real differentiation of who-put-it-there, even if the form of the document is a dialogue or a movie-script between characters.
> Why do they refer to Claude in third person? Why not say "You're Claude and you enjoy helping hoomans"?
But why would they say that? To me that seems a bit childish. Like, say, when writing a script do people say "You're the program, take this var. You give me the matrix"? That would look goofy.
The other day I was talking to Grok, and then suddenly it started outputting corrupt tokens, after which it outputted the entire system prompt. I didn't ask for it.
There truly are a million ways for LLMs to leak their system prompt.
I didn't save the conversation but one of the things that stood out was a long list of bullets saying that Grok doesn't know anything about x/AI pricing or product details, tell user to go x/AI website rather than making things up. This section seems to be longer than the section that defines what Grok is.
Doctors aren't machines, they're humans. I have not yet read the full paper, only the article, but I already see something really big and important to look out for. When I read the full thing, the question I'll be asking is "what's the likelihood that the self-esteem of doctors was directly intervened on by the exam taking process itself." How do you control for the loss in confidence that learning of your test performance gives you? How are we certain that learning your score on the board exam doesn't make you more conservative (or riskier) with how you treat patients as a psychological effect?
This appears to be an observational result, so I'm genuinely perplexed by the reception here. I genuinely thought this comment shows a healthy amount of curiosity and asking important questions. Asking "what control group did this study use?" is usually well-received here.
Yeah but the patient is just a biological machine. This machine can easily be divided into organs and apportioned among specialists. The machine is easily understood by a corpus of research and laboratory experimentation.
. Many inputs can be placed in the machine by physicians, and the outputs are known. The biological machines can easily be isolated from environment, or monitored with high technology, and assigned numbers in databases to be processed in data centers.
Value is extracted from the biological machines mostly from government and 3rd party sources, so there is no real need to rely on machines having a means or will of their own.
There is no compelling reason to treat humans any different from automobiles for the purposes of medicine and medical treatment. In fact humans are less genetically diverse than motor vehicles, and A new model year will always produce a bumper crop of lemons to work on.
The common misconception of someone with a hard science education.
> Many inputs can be placed in the machine by physicians, and the outputs are known. The biological machines can easily be isolated from environment, or monitored with high technology, and assigned numbers in databases to be processed in data centers.
We aren't even close to that level of understanding.
And still, the model works. Lives are saved. We might save many more with a fully integrated non-simplified approach, but it’s not necessary to keep seeing growth in positive outcomes.
We could probably fine-tune a tiny convolutional neutral net image classifier and just hold on the last good frames for longer to cover the frames with clear trolling and nsfw images.
I think it would be subtractive. To my eye, it seems like the point of the project was to celebrate the free expression of the crowd. The lack of censorship and filtering is core to the purpose of the project. If you start filtering out individual contributions in order to more accurately reconstruct the original movie, then I don’t really get why you wouldn’t just pirate the movie.
I'm surprised no one has mentioned atomic Linux distros yet in this thread. The really hard thing here is that people aren't all talking about the same things. My experience on Arch isn't my experience on Pop. Things on Pop are amazing on my MSI rebuilt PC with Nvidia GPU. I don't even know if I really need to upgrade to NixOS except to satiate my curiosity.
> creative work that presents itself as journalism or nonfiction but introduces fictional elements with the intention of upsetting, disturbing, or confusing the audience.
A good time to mention the SCP wiki and some types of analog horror. It was this kind of thing that led me to discover "hard science fiction" which is distinct but related to the previous two.
I think i enjoy hard sci-fi, if my username is any indication of the genre - it's from "A Signal Shattered" by Eric S. Nylund.
However i am curious what sort of Sci-fi you meant, where it's plausible but there's SCP/analog horror elements. Evidently a prior comment in the thread about memetics wiped an edge on the graph that would let me infer the sorts of thing you're talking about.
What I don't see represented in this conversation is the idea that you can just write for personal satisfaction, or examine something you're personally interested in. Not everyone needs to have 10k+ monthly active readers. Not everything needs to be a rat race. Why don't we see blogging like exercise? Sure you'll have your body builders, but some people just go on walks, and no one is doing anything "wrong" they just have different goals.
Indeed. Not everything needs to be an optimization game.
What also isn't discussed much that that readers have different tastes. Sometimes I enjoy a long, rambling narrative if I like the author's style (eg Sadly, Porn). Other times I wish they'd have just written a pamphlet with their 1 interesting idea (eg Die With Zero).
> Of course, you can go off talking about something you find interesting, so long as you explain it in a way the audience can understand. You can use the Mario 0.5x A presses video as your guiding light, your North Star, if you will. ↩
bit.
(After all, the Internet *excells* in allowing people with niche interests to find each other !)
And focused more on how it's about not losing the readers that would actually find it interesting if it was presented just a little bit better.