Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think the 'Agentic coding SWE-Bench Verified' [1] was actually the one benchmark where Google didn't even claim to beat Sonnet 4.5 ;-)

[1] https://deepmind.google/models/gemini/pro/



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: