Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't know the details (of course, it's unreleased), but note that MathArena evaluated "average of 4 attempts", and limited token usages to 64k.

OpenAI likely had unlimited tokens, and evaluated "best of N attempts."



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: