The benchmarks are very impressive. Codex and Opus 4.5 are really good coders already and they keep getting better.
No wall yet and I think we might have crossed the threshold of models being as good or better than most engineers already.
GDPval will be an interesting benchmark and I'll happily use the new model to test spreadsheet (and other office work) capabilities. If they can going like this just a little bit further, much of the office workers will stop being useful.... I don't know yet how to feel about this.
Great for humanity probably but but for the individuals?
Because from my experience using codex in a decently complex c++ environment at work, it works REALLY well when it has things to copy. Refactorings, documentation, code review etc. all work great. But those things only help actual humans and they also take time. I estimate that in a good case I save ~50% of time, in a bad case it's negative and costs time.
But what I generally found, it's not that great at writing new code. Obviously an LLM can't think and you notice that quite quickly, it doesn't create abstractions, use abstractions or try to find general solution to problems.
People who get replaced by Codex are those who do repetitive tasks in a well understood field. For example, making basic websites, very simple crud applications etc..
I think it's also not layoffs but rather companies will hire less freelancers or people to manage small IT projects.
it was only about 2-3 weeks when several HNers told me "nah you better re-check your code", when I explained I have over 2 decades xp of coding, yet have not manually edited code (in memory) for the last 6 or so months, whilst performing daily 12 hour daily vibe code seshes
It really depends on the complexity of code. I've found models (codex-5.1-max, opus 4.5) to be absolutely useless writing shaders or ML training code, but really good at basic web development.
Interesting, I've been using Claude Max with UE5 and while it isn't _brilliant_ with shaders I can usually get it to where I want. Also had a bit of success with converting HLSL shaders to GLSL with it.
Do you have any examples or are your project oss or anything like that? Because I want to believe, but I have people I work with that say and try the same thing (no manual coding), and their work is now terrible.
Ive finally fixed some massive issues in projects that were taking me literally years, Ill be super happy to share once they are ready ( I cant really show my trading app but the game should be fine as soon as I do).
I have canceled my Claude Max subscription because Sonnet 4.5 is just too unreliable. For the rest of the month I'm using Opus 4.1 which is much better but seems to have much lower usage limits than before Sonnet 4.5 was released. When I hit 4.1 Opus limits I'm using Codex. I will probably go through with the Codex pro subscription.
Glad I'm not imagining it, I'll be cancelling my sub. Paying for things only for them to get worse and the provider hoping I don't notice is such a fucking vile tactic.
In my experience sonnet 4.5 is basically pointless, it often gets non-trivial tasks wrong, and for trivial tasks I can use a local model or one of the myriad of providers that give free inference.
EDIT: Holy shit I read the github issue, fuck these people.
> We highly recommend Sonnet 4.5 -- Opus uses rate limits faster, and is not as capable for coding tasks.
Don't try to comprehend the hive mind brother, there are a lot of shills and fanboys in addition to a lot of great people on this forum, sometimes the variance looks pretty bad.
I hope the people downvoting get some minor joy out of it, I know you need it.
Sonnet 4.5 is way worse than Opus 4.1 -- it's incredible that they claim it's their best coding model.
It's obvious if you've used the two models for any sort of complicated work.
Codex with GPT-5 codex (high thinking) is better than both by a long shot, but takes longer to work. I've fully switched to Codex, and I used Claude Code for the past ~4 months as a daily driver for various things.
I only reach for Sonnet now if Codex gets cagey about writing code -- then I let Sonnet rush ahead, and have Codex align the code with my overall plan.
If cost is not considered- absolutely. That being said sonnet 4.5 and using thinking where it makes sense feels like way more bang for your buck and usually good enough. I really don't use opus anymore
They also managed to introduce regressions into WebKit so that the visual and touch positions of fixed input elements diverge. Really makes you question what’s going on at Apple.
Pushing out an exact way to extract that data without giving the creator time to fix it may even be worse than using such code in production. The data may than be in the hands of malicious people who wouldn’t have found it otherwise
There are security advisories, but the feature isn't particularly good. Non-actionable stuff is mixed in with actionable stuff and actionable stuff is IMO presented too generically.