That’s the correct answer. Years ago I worked on inference efficiency on edge ha...

Al-Khwarizmi · on Feb 14, 2024

Less compute also means lower cost, though.

I see how most people would prefer a better but slower model when price is equal, but I'm sure many prefer a worse $2/mo model over a better $20/mo model.

ein0p · on Feb 14, 2024

That’s the thing I’m finding so hard to explain. Nobody would ever pay even $2 for a system that is worse at solving the problem. There is some baseline compute you need to deliver certain types of models. Going below that level for lower cost at the expense of accuracy and robustness is a fool’s errand.

In LLMs it’s even worse. To make it concrete, for how I use LLMs I will not only not pay for anything with less capability than GPT4, I won’t even use it for free. It could be that other LLMs could perform well on narrow problems after fine tuning, but even then I’d prefer the model with the highest metrics, not the lowest inference cost.

sjwhevvvvvsj · on Feb 14, 2024

So I think that’s a “your problem isn’t right for the tool” issue, not a “Mistral isn’t capable” issue.

ein0p · on Feb 14, 2024

It isn’t capable unless you have a very specialized task and carefully fine tune to solve just that task. GPT4 covers a lot of ground out of the box. The best model I’ve seen so far on the FOSS side, Mixtral MoE, is less capable than even GPT 3.5. I often submit my requests to both Mixtral and GPT4. If I’m problem solving (learning something, working with code, summarizing, working on my messaging) Mixtral is nearly always a waste of time in comparison.

sjwhevvvvvsj · on Feb 14, 2024

Again, that’s precisely what I’m saying. A bounded task is best executed against the smallest possible model at the greatest possible speed. This is true for business factors ($$$) as well as environmental (smaller model -> less carbon).

LLM are not AGI, they are tools that have specific uses we are still discovering.

If you aren’t trying to optimize your accuracy to start with and just saying “I’ll run the most expensive thing and assume it is better” with zero evaluation you’re wasting money, time, and hurting the environment.

Also, I don’t even like running Mistral if I can avoid it - a lot of tasks can be done with a fine tune of BERT or DistilBERT. It takes more work but my custom BERT models way outperform GPT-4 on bounded tasks because I have highly curated training data.

Within specialized domains you just aren’t going to see GPT-4/5/6 performing on par with expert curated data.

spacecadet · on Feb 14, 2024

Having spent time on edge compute projects. This.

Also, all the evidence is in this thread. Clearly people unhappy with wasting time on LLMs, when the time that was wasted was the result of obviously bad output.

sjwhevvvvvsj · on Feb 14, 2024

People think LLM are all or nothing, like it’s either god-like AGI or it’s useless “hallucinating”.

In reality you have to know the strengths and weaknesses of any tool, and small/fast LLM can do a tremendous amount within a fixed scope. The people at Mistral get this.

sjwhevvvvvsj · on Feb 14, 2024

Yes, but for certain classes of problems small LLM are highly performant - in many cases equal to a GPT-4, which sure can do more things well, but adding 2+2 is gonna be 4 no matter what. You don’t need a tank to drive to the grocery store, just a small car with a trunk.

So the assertion that small models aren’t as good just isn’t correct. They are amazing at certain things, and are incredibly faster and cheaper than larger models.