It successfully argues that LLMs are limited in usefulness without access to ground truth.
But that’s not the whole story!
Giving LLMs an ability to check their assertions, eg. by emitting and executing code to see if reality matches their word-vomit, or being able to research online - I wish the author had discussed how much of a game changer that is.
Yes I know I’m “only” talking about agents - “LLMs with tools and a goal, running in a loop”..
But adding ground truth takes you out of the loop. That’s super powerful. Make it so the LLM can ask something other than you to point out that that extra R in strawberry that they missed. In code we have code-writing agents but other industries can benefit from the same idea. Maybe a creative writer agent can be given a grammar checker for example.
It helps the thing do more on its own, and you’ll trust its output a lot more so you can use it for more things.
Yes - plain LLMs are stream-of-consciousness machines and basically emit bullshit, but that bullshit is often only minor corrections away from becoming highly useful autonomously emitted output.
They just need to validate against consensus reality to become insanely more useful than they are alone.
It successfully argues that LLMs are limited in usefulness without access to ground truth.
But that’s not the whole story!
Giving LLMs an ability to check their assertions, eg. by emitting and executing code to see if reality matches their word-vomit, or being able to research online - I wish the author had discussed how much of a game changer that is.
Yes I know I’m “only” talking about agents - “LLMs with tools and a goal, running in a loop”..
But adding ground truth takes you out of the loop. That’s super powerful. Make it so the LLM can ask something other than you to point out that that extra R in strawberry that they missed. In code we have code-writing agents but other industries can benefit from the same idea. Maybe a creative writer agent can be given a grammar checker for example.
It helps the thing do more on its own, and you’ll trust its output a lot more so you can use it for more things.
Yes - plain LLMs are stream-of-consciousness machines and basically emit bullshit, but that bullshit is often only minor corrections away from becoming highly useful autonomously emitted output.
They just need to validate against consensus reality to become insanely more useful than they are alone.