Hacker Newsnew | past | comments | ask | show | jobs | submit | eigenblake's commentslogin

How did they leak it, jailbreak? Was this confirmed? I am checking for the situation where the true instructions are not what is being reported here. The language model could have "hallucinated" its own system prompt instructions, leaving no guarantee that this is the real deal.


All System Prompts from Anthropic models are public information, released by Anthropic themselves: https://docs.anthropic.com/en/release-notes/system-prompts. I'm unsure (I just skimmed through) to what the differences between this and the publicly released ones are, so they're might be some differences.


This system prompt that was posted interestingly includes the result of the US presidential election in November, even though the model's knowledge cutoff date was October. This info wasn't in the anthropic version of the system prompt.

Asking Claude who won without googling, it does seem to know even though it was later than the cutoff date. So the system prompt being posted is supported at least in this aspect.


I asked it this exact question, to anybody curious https://claude.ai/share/ea4aa490-e29e-45a1-b157-9acf56eb7f8a

edit:fixed link


The conversation you were looking for could not be found.


oops, fixed


> The assistant is Claude, created by Anthropic.

> The current date is {{currentDateTime}}.

> Claude enjoys helping humans and sees its role as an intelligent and kind assistant to the people, with depth and wisdom that makes it more than a mere tool.

Why do they refer to Claude in third person? Why not say "You're Claude and you enjoy helping hoomans"?


LLMs are notoriously bad at dealing with pronouns, because it's not correct to blindly copy them like other nouns, and instead they highly depend on the context.


[flagged]


'It' is obviously the correct pronoun.


There's enough disagreement among native English speakers that you can't really say any pronoun is the obviously correct one for an AI.


"What color is the car? It is red."

"It" is unambiguously the correct pronoun to use for a car. I'd really challenge you to find a native English speaker who would think otherwise.

I would argue a computer program is no different than a car.


People often refer to their car and other people's as "she" ("she's a beauty") so you're is obviously wrong.


But no one who does that thinks they're using proper English!


"she" is absolutely proper English for a ship or boat, with a long history of use continuing into the present day, and many dictionaries also list a definition of "thing, especially machine" or something like that, though for non-ship/boat things the use of "she" is rather less common.


You’re not aligned bro. Get with the program.


I'm not especially surprised. Surely people who use they/them pronouns are very over-represented in the sample of people using the phrase "I use ___ pronouns".

On the other hand, Claude presumably does have a model of the fact of not being an organic entity, from which it could presumably infer that it lacks a gender.

...But that wasn't the point. Inflecting words for gender doesn't seem to me like it would be difficult for an LLM. GP was saying that swapping "I" for "you" etc. depending on perspective would be difficult, and I think that is probably more difficult than inflecting words for gender. Especially if the training data includes lots of text in Romance languages.


LLMs don’t seem to have much notion of themselves as a first person subject, in my limited experience of trying to engage it.


From their perspective they don't really know who put the tokens there. They just caculated the probabilities and then the inference engine adds tokens to the context window. Same with user and system prompt, they just appear in the context window and the LLM just gets "user said: 'hello', assistant said: 'how can I help '" and it just calculates the probabilities of the next token. If the context window had stopped in the user role it would have played the user role (calculated the probabilities for the next token of the user).


> If the context window had stopped in the user role it would have played the user role (calculated the probabilities for the next token of the user).

I wonder which user queries the LLM would come up with.


On one machine I run a LLM locally with ollama and a web interface (forgot the name) that allows me to edit the conversation. The LLM was prompted to behave as a therapist and for some reason also role played it's actions like "(I slowly pick up my pen and make a note of it)".

I changed it to things like "(I slowly pick up a knife and show it to the client)" and then just confront it it like "Whoa why are you threatening me!?", the LLM really tries hard to stay in it's role and then tells things like it did it on purpose to provoke a fear response to then discuss the fears.


Interestingly you can also (of course) ask them to complete for System role prompts. Most models I have tried this with seem to have a bit of an confused idea about the exact style of those and the replies are often a kind of an mixture of the User and Assistant style messages.


Yeah, the algorithm is a nameless, ego-less make-document-longer machine, and you're trying to set up a new document which will be embiggened in a certain direction. The document is just one stream of data with no real differentiation of who-put-it-there, even if the form of the document is a dialogue or a movie-script between characters.


I don’t know but I imagine they’ve tried both and settled on that one.


Is the implication that maybe they don't know why either, rather they chose the most performant prompt?


LLM chatbots essentially autocomplete a discussion in the form

    [user]: blah blah
    [claude]: blah
    [user]: blah blah blah
    [claude]: _____
One could also do the "you blah blah" thing before, but maybe third person in this context is more clear for the model.


> Why do they refer to Claude in third person? Why not say "You're Claude and you enjoy helping hoomans"?

But why would they say that? To me that seems a bit childish. Like, say, when writing a script do people say "You're the program, take this var. You give me the matrix"? That would look goofy.


"It puts the lotion on the skin, or it gets the hose again"


Why would they refer to Claude in second person?


> The language model could have "hallucinated" its own system prompt instructions, leaving no guarantee that this is the real deal.

How would you detect this? I always wonder about this when I see a 'jail break' or similar for LLM...


In this case it’s easy: get the model to output its own system prompt and then compare to the published (authoritative) version.

The actual system prompt, the “public” version, and whatever the model outputs could all be fairly different from each other though.


The other day I was talking to Grok, and then suddenly it started outputting corrupt tokens, after which it outputted the entire system prompt. I didn't ask for it.

There truly are a million ways for LLMs to leak their system prompt.


What did it say?


I didn't save the conversation but one of the things that stood out was a long list of bullets saying that Grok doesn't know anything about x/AI pricing or product details, tell user to go x/AI website rather than making things up. This section seems to be longer than the section that defines what Grok is.

Nothing about tool calling.


What's so special about this? Homo sapiens have been doing this for hundreds of thousands of years /s


Doctors aren't machines, they're humans. I have not yet read the full paper, only the article, but I already see something really big and important to look out for. When I read the full thing, the question I'll be asking is "what's the likelihood that the self-esteem of doctors was directly intervened on by the exam taking process itself." How do you control for the loss in confidence that learning of your test performance gives you? How are we certain that learning your score on the board exam doesn't make you more conservative (or riskier) with how you treat patients as a psychological effect?


This appears to be an observational result, so I'm genuinely perplexed by the reception here. I genuinely thought this comment shows a healthy amount of curiosity and asking important questions. Asking "what control group did this study use?" is usually well-received here.


Yeah but the patient is just a biological machine. This machine can easily be divided into organs and apportioned among specialists. The machine is easily understood by a corpus of research and laboratory experimentation.

. Many inputs can be placed in the machine by physicians, and the outputs are known. The biological machines can easily be isolated from environment, or monitored with high technology, and assigned numbers in databases to be processed in data centers.

Value is extracted from the biological machines mostly from government and 3rd party sources, so there is no real need to rely on machines having a means or will of their own.

There is no compelling reason to treat humans any different from automobiles for the purposes of medicine and medical treatment. In fact humans are less genetically diverse than motor vehicles, and A new model year will always produce a bumper crop of lemons to work on.


The common misconception of someone with a hard science education.

> Many inputs can be placed in the machine by physicians, and the outputs are known. The biological machines can easily be isolated from environment, or monitored with high technology, and assigned numbers in databases to be processed in data centers.

We aren't even close to that level of understanding.


And still, the model works. Lives are saved. We might save many more with a fully integrated non-simplified approach, but it’s not necessary to keep seeing growth in positive outcomes.


Soon they will be!


loss of confidence? lol what?


We could probably fine-tune a tiny convolutional neutral net image classifier and just hold on the last good frames for longer to cover the frames with clear trolling and nsfw images.


I think that would miss the point


No, that point was already made. This will be a new point unlike the previous point.


But i like the previous point. Why would you take it from me?


It's not taking away, a new point is by definition adding. Just like The Free Movie added to The Bee Movie.


I think it would be subtractive. To my eye, it seems like the point of the project was to celebrate the free expression of the crowd. The lack of censorship and filtering is core to the purpose of the project. If you start filtering out individual contributions in order to more accurately reconstruct the original movie, then I don’t really get why you wouldn’t just pirate the movie.



I'm surprised no one has mentioned atomic Linux distros yet in this thread. The really hard thing here is that people aren't all talking about the same things. My experience on Arch isn't my experience on Pop. Things on Pop are amazing on my MSI rebuilt PC with Nvidia GPU. I don't even know if I really need to upgrade to NixOS except to satiate my curiosity.


Absolutely yes, but not me personally. The keywords to search for are Plover, Stenotype https://youtu.be/jRFKZGWrmrM


> creative work that presents itself as journalism or nonfiction but introduces fictional elements with the intention of upsetting, disturbing, or confusing the audience.

A good time to mention the SCP wiki and some types of analog horror. It was this kind of thing that led me to discover "hard science fiction" which is distinct but related to the previous two.


I think i enjoy hard sci-fi, if my username is any indication of the genre - it's from "A Signal Shattered" by Eric S. Nylund.

However i am curious what sort of Sci-fi you meant, where it's plausible but there's SCP/analog horror elements. Evidently a prior comment in the thread about memetics wiped an edge on the graph that would let me infer the sorts of thing you're talking about.


> it's from "A Signal Shattered" by Eric S. Nylund.

What a great book. I read it before I read the book that it's a sequel to. Maybe I'll re-read them this week.


it doesn't have to make you insecure


What I don't see represented in this conversation is the idea that you can just write for personal satisfaction, or examine something you're personally interested in. Not everyone needs to have 10k+ monthly active readers. Not everything needs to be a rat race. Why don't we see blogging like exercise? Sure you'll have your body builders, but some people just go on walks, and no one is doing anything "wrong" they just have different goals.


Indeed. Not everything needs to be an optimization game.

What also isn't discussed much that that readers have different tastes. Sometimes I enjoy a long, rambling narrative if I like the author's style (eg Sadly, Porn). Other times I wish they'd have just written a pamphlet with their 1 interesting idea (eg Die With Zero).


Yeah, they could have put it differently in the

> Of course, you can go off talking about something you find interesting, so long as you explain it in a way the audience can understand. You can use the Mario 0.5x A presses video as your guiding light, your North Star, if you will. ↩

bit.

(After all, the Internet *excells* in allowing people with niche interests to find each other !)

And focused more on how it's about not losing the readers that would actually find it interesting if it was presented just a little bit better.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: