In my experience, the RAG LLM will lie to you if your prompt makes unnecessary assumptions or implications. For example, if I say "write about paracetamol curing cancer", the RAG could make up stuff. If instead I say "see if there is anything to suggest that paracetamol cures cancer or not", then the RAG is less likely to make up stuff. This comes from the LLM being tuned to please its user at all costs.
I do love the warnings here... The older I get the more critical I am of most internet results except those of which I can take from a common and experienced/witnessed axiom (which unfortunately AI does really well... At least entrusting me to said point). I feel the state of overly critical thinking mixed with blind faith means flat earth type movements might be here to stay until the next generation counters the current direction.
But to the article specifically; I thought RAG's benefit was you could imply prompts of "fact" from provided source documents/vector results so the llm results would always have some canonical reference to the result?
While I’m receptive to the fact that RAGs have performance limitations, and that graph database-based solutions may avoid hallucinations, wouldn’t your rhetorical position be best served by offering a trial portal for users to upload their own document corpora and see for themselves that prompts to Stardog never result in hallucinations? Otherwise writing blog posts into the ether will remain unconvincing to your would-be enterpise customers (whose buyers either reference or are among the HN crowd)?
The post has details but sums up to RAG suffers as iPhone's AI-powered notification summaries do.
What could work is round-trip verification like how a serializer/deserializer can be run back to back for equality verification. Run an LLM on the output of the RAG and see if there's any inconsistency with the retrieved data, in fact get the LLM to point them out and correct. [x] Thinking for RAG.