I'm curious about getting one of these to run LLM models locally, but I don't un...

brandall10 · on Oct 30, 2024

You'll be able to run 72B models w/ large context, lightly quantized with decent'ish performance, like 20-25 tok/sec. The best of the bunch are maybe 90% of a Claude 3.5.

If you need to do some work offline, or for some reason the place you work blocks access to cloud providers, it's not a bad way to go, really. Note that if you're on battery, heavy LLM use can kill your battery in an hour.

SkyMarshal · on Oct 30, 2024

Lots of discussion and testing of that over on https://www.reddit.com/r/LocalLLaMA/, worth following if you're not already.

bufferoverflow · on Oct 30, 2024

Claude 3.5 and GPT 4o are huge models. They don't run on consumer hardware.