Hacker Newsnew | past | comments | ask | show | jobs | submit | notemap's commentslogin

Don't trust it. Problem B required an algorithm to run for inputs up to N ~ 200, and a clever graph theory lemma, before succumbing to pattern matching / law of larger numbers. Claiming that there's a pattern for N < 20 seems like classic AI slop.

EDIT: Just submitted it, WA. Yeah.

Problems are hard enough where consumer models can't solve all 12 problems.


If we're benchmarking problems, mind trying out this problem on Pro if you're willing to spare the compute?

https://www.acmicpc.net/problem/33797

I have the 20$ plan and I think I found a weird bug, at least with the thinking version. It gets stuck in the same local minima super quickly, even though the "fake solution" is easily disproved on random tests.

It's at the point where sometimes I've fed it the editorial and it still converges to the fake solution.

https://chatgpt.com/share/68c8b2ef-c68c-8004-8006-595501929f...

I'm sure that the model is capable of solving it, but seriously I've tried across multiple generations (since about when o3 came out) to get GPT to solve this problem and it's not hampered by its innate ability I don't think, it literally just refuses to think critically about the problem. Maybe with better prompting it doesn't get stuck as hard?


Solve one of my problems :)

https://www.acmicpc.net/problem/33797

Try with an LLM too :)


403 Forbidden :(



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: