lol I love how OpenAI just straight up doesn't compare their model to others on ...

qwesr123 · 2025-12-18T19:06:51 1766084811

Not sure why they don't compare with others, but they are actually leading on the benchmarks they published. See here (bottom) for a chart comparing to other models: https://marginlab.ai/blog/swe-bench-deep-dive/

mistercheph · 2025-12-18T19:34:27 1766086467

It's like apple, they just don't want users or anyone to even be thinking of their competitors, the competition doesn't exist, it's not relevant.

whimsicalism · 2025-12-18T20:02:35 1766088155

is swe-bench saturated? or they switch to swe-bench pro because...?

Mkengin · 2025-12-18T22:45:22 1766097922

At least on swe-rebench it does pretty well: https://swe-rebench.com/

dbbk · 2025-12-18T20:14:02 1766088842

This was the one thing I scanned for. No comparison against Opus. See ya.

Mkengin · 2025-12-18T22:49:10 1766098150

Though this Codex version isnt on the leaderboard, GPT-5.2-Medium already seems to be a bit better than Opus 4.5: https://swe-rebench.com/

gizmodo59 · 2025-12-18T23:36:50 1766101010

Is that your website or something? You keep promoting it

Mkengin · 2025-12-21T13:48:02 1766324882

No, I am not affiliated with the website, I just want to see more discussions based on uncontaminated benchmarks and feel that people rely too much on benchmarks that companies can conduct themselves. If that is the case, I don't feel I can trust them. For general LLM capabilities, for example, I would also tend to rely on dubesor [1] rather than artificial analysis or similar leaderboards.

[1] https://dubesor.de/benchtable