Hacker Newsnew | past | comments | ask | show | jobs | submit | holomorphiclabs's commentslogin

Thank you! and yes huge fan of r/localllama :)


Thank you we have been exploring this.


Our findings are that RAG does not generalize well when critical understanding is shared over a large corpus of information. We do not think it is a question of either context length or retrieval. In our case it is very clearly capturing understanding within the model architecture itself.


Does that mean you tested on specific questions? Get 1-5 typical queries and test them with a properly configured llamaindex.

If your documents repeat the same information several different ways then you actually might get something out of LoRA on raw documents. But you need a way to measure it and you have to verify that RAG won't work with real tests first.

To do effective training with LoRA though and expect it to pick up most of the information reliably then you need to cover the knowledge and skills with multiple question answer pairs for each item you expect it to learn. Which you can then use QA pairs to validate that it learned those things.

But it's a lot of QA pair generation.


We are finding there is a trade-off between model performance and hosting costs post-training. The optimal outcome is where we have a model that performs well on next-token prediction (and some other in-house tasks we've defined) that ultimately results in a model that we can host on the lowest-cost hosting provider rather than be locked in. I think we'd only go the proprietary model route if the model really was that much better. We're just trying to save our selves weeks/months of benchmarking time/costs if there was already an established option in this space.


Are there any that are recommended? Honestly we would rather not share data with any 3P vendors. It's been a painstaking progress to curate it.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: