I ran a few replacements because the puzzle is probably in the training data som...

mbernstein · 2025-02-02T05:04:14 1738472654

I just did a pass with some replacements with o1 and it very much still recognized it as the Einstein riddle and actually seems to have cheated a bit :)

"Revisiting assumptions

Considering "Camels" might be a mistake for "Kools," leading to confusion. This inconsistency complicates solving the puzzle, showing the need for careful brand assignment."

Tracking puzzle progress

I’m mapping out various house and nationality combinations, but the classic conclusion is the Norwegian drinks water and the Japanese owns the zebra.

Analyzing the arrangement

I’m working through the classic puzzle structure and noting variations, while consistently identifying the Norwegian drinking water and the Japanese owning the zebra as the final solution."

simonw · 2025-02-02T05:07:35 1738472855

Hah, that's fun. My o3-mini-high transcript didn't hint that it recognized the puzzle and looked legit when I scanned through them, but I'm still very suspicious since this is evidently such a classic puzzle.

I should have changed the cigarette brands to something else too.

thaumasiotes · 2025-02-02T21:38:07 1738532287

If you want to make a cosmetic change to the puzzle, you might try eliminating the massive quantity of implicit information in "the green house is immediately to the right of the ivory house".

After doing some substitutions on what it means to be in positions 1/2/3/4/5:

A. If the ivory house is in London, the green house is in Madrid.

B. If the ivory house is in Madrid, the green house is in Kiev.

C. If the ivory house is in Kiev, the green house is in Oslo.

D. If the ivory house is in Oslo, the green house is in Tokyo.

E. The ivory house is not in Tokyo.

9. Milk is drunk in Kiev.

11(A). If the man who smokes Chesterfields lives in Tokyo, the man with the fox lives in Oslo.

12(A). If the man with the horse lives in Oslo, Kools are smoked in either Tokyo or Kiev.

15(A). If the blue house is in Madrid, the Norwegian lives either in London or in Kiev.

[...]

Another easy change is to exchange categories. Swap the animals for the drinks and instead of "the Spaniard owns the dog" and "the Ukrainian drinks tea", you'll have "the Spaniard drinks tea" and "the Ukrainian owns the fox" (depending on which equivalences you decide on). It won't make any difference to the puzzle, but it will permute the answer.

valenterry · 2025-02-02T06:10:00 1738476600

Try flipping the order, adding a few nonsense steps and combining 2 steps into one and also splitting a single step into two. And then see what happens and post it here. :-)

simonw · 2025-02-02T06:38:57 1738478337

I tried it against deepseek-r1-distill-llama-70b running on Groq (which is really fast) and it didn't get the right answer: https://gist.github.com/simonw/487c4c074cd6ad163dba061e1e594...

I ran it like this:

  llm -m groq/deepseek-r1-distill-llama-70b '
    H01 There are five huts.
    H02 The Scotsman lives in the purple hut.
    H03 The Welshman owns the parrot.
    H04 Kombucha is drunk in the scarlet hut.
    H05 The Romanian drinks butterscotch.
    H06 The scarlet hut is immediately to the right of the pink hut.
    H07 The Old Gold smoker owns scorpions.
    H08 Kools are smoked in the turquoise hut.
    H09 Red Bull is drunk in the middle hut.
    H10 The Brazilian lives in the first hut.
    H11 The man who smokes Chesterfields lives in the hut next to the man with the bear.
    H12 Kools are smoked in a hut next to the hut where the mule is kept.
    H13 The Lucky Strike smoker drinks rum.
    H14 The German smokes Parliaments.
    H15 The Brazilian lives next to the brown hut.
    Now,
    Q1 Who drinks water?
    Q2 Who owns the zebra?'

Using this plugin: https://github.com/angerman/llm-groq

simonw · 2025-02-02T06:51:54 1738479114

Full DeepSeek R1 - accessed through the DeepSeek API (their "deepseek-reasoner" model) - got the right answer: https://gist.github.com/simonw/f77be3bbc720e1314235d42593562...

UncleEntity · 2025-02-02T08:39:54 1738485594

Whatever model is behind chat.deepseek.com got it in 348 seconds.

It amazes me they don't time that thing out after, IDK, 5 minutes of computation time.

RossBencina · 2025-02-02T05:12:27 1738473147

> the puzzle is probably in the training data somewhere

Given that these models can perform translation I'm not sure why you think renaming things is sufficient to put your version out of distribution.

simonw · 2025-02-02T05:26:16 1738473976

I don't - but like I said, I reviewed the thought process in the transcript and it looked legit to me.

I'm not sure what else I could do here to be honest, without coming up with a completely new puzzle that captures the same kind of challenge as the original. I'm not nearly patient enough to do that!

EGreg · 2025-02-02T06:04:57 1738476297

The thought process is basically like asking it to think in French. Same training data LOL

ta12653421 · 2025-02-02T07:55:00 1738482900

ClaudeAI responded: >>> After working through all constraints: Q1: Who drinks water? The German drinks water. Q2: Who owns the zebra? The Scotsman owns the zebra. <<<