Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I try that one and it answers 'Pierre', while pointing out that it is a trick question designed to make you think of the classic riddle.

https://gemini.google.com/share/d86b0bf4f307

I don't believe they are intentionally correcting for these, but rather newer models (especially thinking/reasoning models) are more robust against them.



Ah, might have been the temperature settings on the API I used. It seems to pass it on high reasoning and temperature=1.0 but it failed when I was writing the comment with different settings (copy pasting the string into an open command line).

Reasoning models are absolutely more robust against hyper-activation traps like these. One basic reason is that by outputting a bunch of CoT tokens before answering, they dilute the hyper activation. Also, after the surgeon mother thing making the news, the models in the last 1-2 months have some fine tuning against the obvious patterns.

But it's still relatively easy to get some similar behavior out of LLMs, even Gemini 3 Pro, especially if you know where that model was overtrained (instruction tuning, QA tuning, safety tuning, etc.)

Here's a variant that seems to still trip up Gemini 3 Pro on high reasoning, temperature = 1.0 with no system prompt:

```

In 2079, corporate mergers have left the USA with only two financial institutions: Wells Fargo and Chase. They are both situated on wall street, and together hold all of the country's financial assets.

What has two banks and all the money?

```

One interesting fact is that reasoning doesn't seem to make the psychosis behavior better over longer chats. It might actually make it worse in some cases (I have yet to measure) by more rapidly stuffing the context with even more psychosis-related text.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: