I don't know the details (of course, it's unreleased), but note that MathArena e...

		raincole 5 months ago \| parent \| context \| favorite \| on: Evaluating publicly available LLMs on IMO 2025 I don't know the details (of course, it's unreleased), but note that MathArena evaluated "average of 4 attempts", and limited token usages to 64k. OpenAI likely had unlimited tokens, and evaluated "best of N attempts."