That’s the correct answer. Years ago I worked on inference efficiency on edge hardware at a startup. Time after time I saw that users vastly prefer slower, but more accurate and robust systems. Put succinctly: nobody cares how quick a model is if it doesn’t do a good job. Another thing I discovered is it can be very difficult to convince software engineers of this obvious fact.
I see how most people would prefer a better but slower model when price is equal, but I'm sure many prefer a worse $2/mo model over a better $20/mo model.
That’s the thing I’m finding so hard to explain. Nobody would ever pay even $2 for a system that is worse at solving the problem. There is some baseline compute you need to deliver certain types of models. Going below that level for lower cost at the expense of accuracy and robustness is a fool’s errand.
In LLMs it’s even worse. To make it concrete, for how I use LLMs I will not only not pay for anything with less capability than GPT4, I won’t even use it for free. It could be that other LLMs could perform well on narrow problems after fine tuning, but even then I’d prefer the model with the highest metrics, not the lowest inference cost.
It isn’t capable unless you have a very specialized task and carefully fine tune to solve just that task. GPT4 covers a lot of ground out of the box. The best model I’ve seen so far on the FOSS side, Mixtral MoE, is less capable than even GPT 3.5. I often submit my requests to both Mixtral and GPT4. If I’m problem solving (learning something, working with code, summarizing, working on my messaging) Mixtral is nearly always a waste of time in comparison.
Again, that’s precisely what I’m saying. A bounded task is best executed against the smallest possible model at the greatest possible speed. This is true for business factors ($$$) as well as environmental (smaller model -> less carbon).
LLM are not AGI, they are tools that have specific uses we are still discovering.
If you aren’t trying to optimize your accuracy to start with and just saying “I’ll run the most expensive thing and assume it is better” with zero evaluation you’re wasting money, time, and hurting the environment.
Also, I don’t even like running Mistral if I can avoid it - a lot of tasks can be done with a fine tune of BERT or DistilBERT. It takes more work but my custom BERT models way outperform GPT-4 on bounded tasks because I have highly curated training data.
Within specialized domains you just aren’t going to see GPT-4/5/6 performing on par with expert curated data.
Also, all the evidence is in this thread. Clearly people unhappy with wasting time on LLMs, when the time that was wasted was the result of obviously bad output.
People think LLM are all or nothing, like it’s either god-like AGI or it’s useless “hallucinating”.
In reality you have to know the strengths and weaknesses of any tool, and small/fast LLM can do a tremendous amount within a fixed scope. The people at Mistral get this.
Yes, but for certain classes of problems small LLM are highly performant - in many cases equal to a GPT-4, which sure can do more things well, but adding 2+2 is gonna be 4 no matter what. You don’t need a tank to drive to the grocery store, just a small car with a trunk.
So the assertion that small models aren’t as good just isn’t correct. They are amazing at certain things, and are incredibly faster and cheaper than larger models.