I compared my daughter against SOTA models on math puzzles

przadka · 2025-02-02T13:58:16 1738504696

I tested o3, r1, 4o and other SOTA models against puzzles from an international math competition and compared their performance with my 11-year-old daughter's solutions. Full results include detailed conversations with each model and complete methodology.

ukituki · 2025-02-02T15:24:54 1738509894

Interesting how the reasoning differs between models, e.g. DeepSeek trying the brute force tricks

michalwarda · 2025-02-02T14:03:11 1738504991

Very cool post! I wonder how much will it affect the psychology of next generations.