Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm curious about getting one of these to run LLM models locally, but I don't understand the cost benefit very well. Even 128GB can't run, like, a state of the art Claude 3.5 or GPT 4o model right? Conversely, even 16GB can (I think?) run a smaller, quantized Llama model. What's the sweet spot for running a capable model locally (and likely future local-scale models)?


You'll be able to run 72B models w/ large context, lightly quantized with decent'ish performance, like 20-25 tok/sec. The best of the bunch are maybe 90% of a Claude 3.5.

If you need to do some work offline, or for some reason the place you work blocks access to cloud providers, it's not a bad way to go, really. Note that if you're on battery, heavy LLM use can kill your battery in an hour.


Lots of discussion and testing of that over on https://www.reddit.com/r/LocalLLaMA/, worth following if you're not already.


Claude 3.5 and GPT 4o are huge models. They don't run on consumer hardware.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: