I’m curious about if the model has gotten more consistent throughout the full context window? It’s something that OpenAI touted in the release, and I’m curious if it will make a difference for long running tasks or big code reviews.
one positive is that 5.2 is very good at finding bugs but not sure about throughputs I'd imagine it might be improved but haven't seen a real task to benchmark it on.
what I am curious about is 5.2-codex but many of us complained about 5.1-codex (it seemed to get tunnel visioned) and I have been using vanilla 5.1
its just getting very tiring to deal with 5 different permutations of 3 completely separate models but perhaps this is the intent and will keep you on a chase.