No, what is meant is that the next word I speak/write after a current word are not based on a statistical model, but on a world model which includes a language structure based on a defined syntax and cultural variaty. I actually mean what I say while the ChatGPT just parrots around weights and produces an output based purely on statistics. There is zere modeling which translates into real world ( what normally we call "understanding" and "experience" ).
Oh, I see. Then I agree with you, an isolated model can't do any world modelling on its own. No matter how large it is, the real world is more complex.
It might be connected to the world, of course. And it might even use toys such as simulators, code execution, math verification and fact checking to further ground itself. I was thinking about the second scenario.
As was said, a different architecture.