I'm curious about getting one of these to run LLM models locally, but I don't understand the cost benefit very well. Even 128GB can't run, like, a state of the art Claude 3.5 or GPT 4o model right? Conversely, even 16GB can (I think?) run a smaller, quantized Llama model. What's the sweet spot for running a capable model locally (and likely future local-scale models)?
You'll be able to run 72B models w/ large context, lightly quantized with decent'ish performance, like 20-25 tok/sec. The best of the bunch are maybe 90% of a Claude 3.5.
If you need to do some work offline, or for some reason the place you work blocks access to cloud providers, it's not a bad way to go, really. Note that if you're on battery, heavy LLM use can kill your battery in an hour.