People get very hung up on this "autocomplete" idea, but language is a linear stream. How else are you going to generate text except for one token at a time, building on what you have produced already?
That's what humans do after all (at least with speech/language; it might be a bit less linear if you're writing code, but I think it's broadly true).
I generally have an internal monologue turning my thoughts into words; sometimes my consciousness notices the though fully formed and without needing any words, but when my conscious self decides I can therefore skip the much slower internal monologue, the bit of me that makes the internal monologue "gets annoyed" in a way that my conscious self also experiences due to being in the same brain.
It might actually be linear — how minds actually function is in many cases demonstrably different to how it feels like to the mind doing the functioning — but it doesn't feel like it is linear.
The technology doesn't yet exist to measure the facts that generate the feelings to determine whether the feelings do or don't differ from those facts.
Nobody even knows where, specifically, qualia exist in order to be able to direct technological advancement in that area.
But ideas are not. The serialization-format is not the in-memory model.
Humans regularly pause (or insert delaying filler) while converting nonlinear ideas into linear sounds, sentences, etc. That process is arguably the main limiting factor in how fast we communicate, since there's evidence that all spoken languages have a similar bit-throughput, and almost everyone can listen to speech at a faster rate than they can generate it. (And written text is an extension of the verbal process.)
Also, even comparatively simple ideas can be expressed (and understood) with completely different linear encodings: "The dog ate my homework", "My homework was eaten by the dog", and even "Eaten, my homework was, the dog, I blame."
Spoken language is linear but it is a way of displaying hierarchal, nonlinear information. Sign languages occasionally exploit the fact they aren't constrained by linear order in the same way to do multiple things simultaneously.
People get very hung up on this "autocomplete" idea, but language is a linear stream. How else are you going to generate text except for one token at a time, building on what you have produced already?
That's what humans do after all (at least with speech/language; it might be a bit less linear if you're writing code, but I think it's broadly true).