Ask HN: What's the state of LLM knowledge retrieval in Jan 2024?

cl42 · on Jan 22, 2024

Fine-tuning is not a good approach to integrating new knowledge into an LLM. It's a good way to drive the direction of the LLM's style, structure of responses (e.g., length, format).

I'd say RAG is still very much the way to go. What you need to then do is optimize how you chunk and embed data into the RAG database. Pinecone has a good post on this[1] and I believe others[2] are working on more automated solutions.

If you want a more generalized idea here, what state of the art (SOTA) models seems to be doing is using a more general "second brain" for LLMs to obtain information. This can be in the form of RAG, as per above, or in the form of more complex and rigorous models. For example, AlphaGeometry[3] uses an LLM combined with a geometry theorem prover to find solutions to problems.

[1] https://www.pinecone.io/learn/chunking-strategies/

[2] https://unstructured.io/

[3] https://deepmind.google/discover/blog/alphageometry-an-olymp...

mfalcon · on Jan 23, 2024

I think there is a lot of ground to cover in "RAG". Most of the demos or tutorials online seem to simply use a vector database to retrieve similar documents according to a cosine distance.

I'm now working on a "hybrid" search combining lexical and semantic search, using an LLM to translate a user message into a search query to retrieve data.

As far as I know, there's not a "standard", the field keeps moving and there are no simple answers.