I think the 'Agentic coding SWE-Bench Verified' [1] was actually the one benchma...

		arendtio 47 days ago \| parent \| context \| favorite \| on: Claude Opus 4.5 I think the 'Agentic coding SWE-Bench Verified' [1] was actually the one benchmark where Google didn't even claim to beat Sonnet 4.5 ;-) [1] https://deepmind.google/models/gemini/pro/