Evals are showing reasoning (by which I mean multi-step problem solving, planning, etc) is improving over time in LLMs. We don't have to agree on metaphysics to see this; I'm referring to the measurable end result.
Why? Some combination of longer context windows, better architectures, hybrid systems, and so on. There is more research about how and where reasoning happens (inside the transformer, during the chain of thought, perhaps during a tool call).
I believe the fact that you edited your post after my reply, then disingenuously left this reply speaks quite plainly that you know exactly what my criticism meant.
Evals are showing reasoning (by which I mean multi-step problem solving, planning, etc) is improving over time in LLMs. We don't have to agree on metaphysics to see this; I'm referring to the measurable end result.
Why? Some combination of longer context windows, better architectures, hybrid systems, and so on. There is more research about how and where reasoning happens (inside the transformer, during the chain of thought, perhaps during a tool call).