Broadly, I think other open source solutions are lacking in (1) integration of external knowledge into the chat (2) simple UX (3) complex "agent" flows.
Both internal RAG and web search are hard to do well, and since we've started as an enterprise search project we've spent a lot of time making it good.
Most (all?) of these projects have UXs that are quite complicated (e.g. exposing front-and-center every model param like Top P without any explanation, no clear distinction between admin/regular user features, etc.). For broader deployments this can overwhelm people who are new to AI tools.
Finally trying to do anything beyond a simple back and forth with a single tool calls isn't great with a lot of these projects. So something like "find me all the open source chat options, understand their strengths/weaknesses, and compile that into a spreadsheet" will work well with Onyx, but not so well with other options (again partially due to our enterprise search roots).
Looking forward to your progress! Just checked the paper and it says the underlying backbone is still DETR. My guess would be that SAM3 uses more video frames during the training process and caused the dilution of sparse engineering-paper-like data.
1. The chat context is always provided, and that introduces a bit of uncertainty - when the chat history mentioned something the model is always inclined to connect with it.
2. When I tried to set each context to an empty string, the model doesn't show any evidence of remembering concepts. I told it 5 times that I love cats, and when asked about its favorite animal, its output remains "honeybee" and "octopus".
I can’t decide if I’m skeptical of the entire concept or not. I guess I believe it will do something to the network to add this EMA of vectors in, so I’m surprised you didn’t get at least a change in animals after talking about cats. But, I’m not clear that reweighting logits at the end is super useful. I guess this is supposed to be in some way a realtime LoRA, but then what do you have except a super-undertrained LoRA, trained just off whatever conversations you’ve had?
Looks pretty good but all CJK characters are displayed as questionmarks (???) and switching to a CJK native font does no help. So my user folder is now crowded with ???????? which makes it hard to navigate :(
For llama3 just ask him to install ollama and serve the model. Ollama has auto memory management and will free the model when not used, and whenever you make a call to the API (do let your friend know before you do this) ollama will reload the model back to memory again.
Not sure whether there are anything similar for SD though.
I wonder how would they calculate the metrics if the result is generated instead of retrieved? Is it likely that the LLM can generate exactly the same output as the desired result?
You can compile Qt to target the browser instead of native apps. But besides seeing a demo I don't know how good it is with performances, accessibility and so on.