I am! I moved from a shoebox Linux workstation with 32MB of RAM and a 12GB RTX 3060 to a 256GB M3 Ultra, mainly for unified memory.
I've only had it a couple of months, but so far it's proving its worth in the quality of LLM output, even quantized.
I generally run Qwen3-vl at 235b, at a Q4_K_M quantization level so that it fits, and it leaves me plenty of RAM for workstation tasks while delivering tokens at around 30tok/s
The smaller Qwen3 models (like qwen3-coder) I use in tandem, of course they run much faster and I tend to run them at higher quants up to Q8 for quality purposes.
The gigantic RAM's biggest boon, I've found, is letting me run the models with full context allocated, which lets me hand them larger and more complicated things than I could before. This alone makes the money I spent worth it, IMO.
I did manage to get glm-4.7 (a 358b model) running at a Q3 quantization level; it's delivery is adequate quality-wise, although it delivers at 15tok/s, though I did have to cut down to only 128k context to leave me enough room for the desktop.
If you get something this big, it's a powerhouse, but not nearly as much of a powerhouse as a dedicated nVidia GPU rig. The point is to be able to run them _adequately_, not at production speeds, to get your work done. I found price/performance/energy usage to be compelling at this level and I am very satisfied.
I think this is fun! I ended up with 92/102, because a few were non-obvious. If I had a critique it's that the AI entries were repeated on multiple occasions, making the decision between them trivial.
If you can find some public domain literature from the 20th century, it would be a much harder game.
Thanks for the feedback! I've gone ahead and added some more newer text for human text. Tried to add some randomness in temperature control for llm generation, should help with non repeating entries!
I'm going to buck trends and get my hands dirty -- The oil needs changed and the O2 sensor needs replaced on an old (1992) Ford Ranger pickup truck I recently acquired. Bought it for a song, and it's definitely not a looker but it's still going strong (especially after the small age-related repairs I'm doing).
2.3L 4-cylinder. Good enough for the light truck tasks I usually end up having. It's the last year of the first generation Rangers. I had a 1988 model with a 4 cylinder when I was young, and -that- one trucked along through all the youthful abuse I threw at it. I'm really looking forward to enjoying this one too.
I've only had it a couple of months, but so far it's proving its worth in the quality of LLM output, even quantized.
I generally run Qwen3-vl at 235b, at a Q4_K_M quantization level so that it fits, and it leaves me plenty of RAM for workstation tasks while delivering tokens at around 30tok/s
The smaller Qwen3 models (like qwen3-coder) I use in tandem, of course they run much faster and I tend to run them at higher quants up to Q8 for quality purposes.
The gigantic RAM's biggest boon, I've found, is letting me run the models with full context allocated, which lets me hand them larger and more complicated things than I could before. This alone makes the money I spent worth it, IMO.
I did manage to get glm-4.7 (a 358b model) running at a Q3 quantization level; it's delivery is adequate quality-wise, although it delivers at 15tok/s, though I did have to cut down to only 128k context to leave me enough room for the desktop.
If you get something this big, it's a powerhouse, but not nearly as much of a powerhouse as a dedicated nVidia GPU rig. The point is to be able to run them _adequately_, not at production speeds, to get your work done. I found price/performance/energy usage to be compelling at this level and I am very satisfied.
reply