A major reason I haven’t really tried any of these things (despite thinking they are vaguely neat). I think I will wait until… 2026, second half, most likely. At least I’ll check if we have local models and hardware that can run them nicely, by then.
Hats off to the folks who have decided to deal with the nascent versions though.
Depending on the definition of "nicely", FWIW I currently run Ollama sever [1] + Qwen Coder models [2] with decent success compared to the big hosted models. Granted, I don't utilize most "agentic" features and still mostly use chat-based interactions.
The server is basically just my Windows gaming PC, and the client is my editor on a macOS laptop.
Most of this effort is so that I can prepare for the arrival of that mythical second half of 2026!
Thanks for sharing your setup! I'm also very interested in running AI locally. In which contexts are you experiencing decent success? eg debugging, boilerplate, or some other task?
I'm running qwen via ollama on my M4 Max 14 inch with the OpenWebUI interface, it's silly easy to set up.
Not useful though, I just like the idea of having so much compressed knowledge on my machine in just 20gb. In fact I disabled all Siri features cause they're dogshit.
When ChatGPT, then Llama, then Alpaca came out in rapid succession, I decided to hold off a year before diving in. This was definitely the right choice at the time, it’s becoming less-the-right-choice all the time.
In particular it’s important to get past the whole need-to-self-host thing. Like, I used to be holding out for when this stuff would plateau, but that keeps not happening, and the things we’re starting to be able to build in 2025 now that we have fairly capable models like Claude 4 are super exciting.
If you just want locally runnable commodity “boring technology that just works” stuff, sure, cool, keep waiting. If you’re interested in hacking on interesting new technology (glances at the title of the site) now is an excellent time to do so.
I wouldn’t want to become dependent on something like OpenAI, at least not until we see what the “profitable” version of the company is.
If they have to enshiffify, I don’t want that baked into my workflow. If they have to raise prices, that changes the local vs remote trade off. If they manage to lower prices, then the cost of running locally will be reduced as well.
I’m also not sure what the LLMs that I’d want to use look like. No real deal-maker applications have shown up so far; if the good application ends up being something like “integrate it into neovim and suggest completions as you type” obviously I won’t want to hit the network for that.
i don’t quite see the point in waiting, if you’re using something like lm studio just download the latest and greatest and you’re on your way, where is the fatigue part?
i can understand maybe if you’re spending hours setting it up but to me these are download and go
Yeah, that’s why I said waiting a year after Llama v1 was good. By that point llama.cpp, LM Studio and Ollama were all pretty well established and a lot of low-hanging fruit around performance and memory mapping stuff was picked.
It is completely unreasonable to buy the hardware to run a local model and only use it 1% of the time. It will be unreasonable in 2026 and probably very long after that.
Maybe s.th. like a collective that buys the gpu's together and then uses them without leaking data can work.
over time you would assume the models will get more efficient and the hardware will get better to the point that buying a massive new gpu with boatloads of vram is just not necessary
maybe 128gb of vram becomes the new mid tier model and most llms can fit into this nicely and do everything one wants in an llm
given how fast llms are progressing it wouldn’t surprise me if we reach this point by 2030
Considering there were two generations (around 4.5 years) of top-tier consumer GPUs (3090/4090) stuck at 24GB VRAM max, and the current one (5090) "only" bumped it up to 32GB, I think you'll be waiting more than 5 years before 128GB VRAM comes to the mid tier model GPU. 12-16GB is currently mid tier and has been since LLMs became "a thing".
I hope I'm wrong though, and we see a large bump soon. Even just 32GB in the mid tier would be huge.
I'm really tempted to try out a Mac Studio with 256+ GB Unified Memory (192 GB VRAM), but it is sadly out of my budget at the moment. I know there is a bandwidth loss, but being able to run huge models and huge contexts locally would be quite nice.
I have a modified tiered approach, that I adopted without consciously thinking hard about it.
I use AI mostly for problems on my fringes. Things like manipulating some Excel table somebody sent me with invoice data from one of our suppliers and some moderately complex question that they (pure business) don't know how to handle, where simple formulas would not be sufficient and I would have to start learning Power Query. I can tell the AI exactly what I want in human language and don't have to learn a system that I only use because people here use it to fill holes not yet served by "real" software (databases, automated EDI data exchange, and code that automates the business processes). It works great, and it saves me hours on fringe tasks that people outsource to me, but that I too don't really want to deal with too much.
For example, I also don't check various vendors and models against one another. I still stick to whatever the default is from the first vendor I signed up with, and so far it worked well enough. If I were to spend time checking vendors and models, the knowledge would be outdated far too quickly for my taste.
On the other hand, I don't use it for my core tasks yet. Too much movement in this space, I would have to invest many hours in how to integrate this new stuff when the "old" software approach is more than sufficient, still more reliable, and vastly more economical (once implemented).
Same for coding. I ask AI on the fringes where I don't know enough, but in the core that I'm sufficiently proficient with I wait for a more stable AI world.
I don't solve complex sciency problems, I move business data around. Many suppliers, many customers, different countries, various EDI formats, everybody has slightly different data and naming and procedures. For example, I have to deal with one vendor wanting some share of pre-payment early in the year, which I have to apply to thousands of invoices over the year and track when we have to pay a number of hundreds or thousands of invoices all with different payment conditions and timings. If I were to ask the AI I would have to be so super specific I may as well write the code.
But I love AI on the not-yet-automated edges. I'm starting to show others how they can ask some AI, and many are surprised how easy it is - when you have thee right task and know exactly hat you have and what you want. My last colleague-convert was someone already past retirement age (still working on the business side). I think this is a good time to gradually teach regular employees some small use cases to get them interested, rather than some big top-down approach that mostly creates more work and many people then rightly question what the point is.
About politically-touched questions like whether I should rather use an EU-made AI like the one this topic is about, or use one from the already much of the software-world dominating US vendor, I don't care at this point, because I'm not yet creating any significant dependencies. I am glad to see it happening though (as an EU country citizen).
> About politically-touched questions like whether I should rather use an EU-made AI like the one this topic is about, or use one from the already much of the software-world dominating US vendor, I don't care at this point, because I'm not yet creating any significant dependencies. I am glad to see it happening though (as an EU country citizen).
Another nice thing about waiting a bit—one can see how much (if any) the EU models get from paying the “do things somewhat ethically” price. I suspect it won’t be much of a penalty.
Hats off to the folks who have decided to deal with the nascent versions though.