Hacker Newsnew | past | comments | ask | show | jobs | submit | evilduck's commentslogin

I'm in the same boat. I use AI to generate tons of small things for work. None of it is shareable online because it's unique to my workplace and it's not some generic reusable tool, and for the most part the scripts are boring. Their most interesting attribute is how little effort they took, not their originality or grandness of scale.

Gemma and GPT-OSS are both useful. Neither are threats to their frontier models though.

Works great for these type of MOE models. The ability to have large amounts of VRAM let you run different models in parallel easily, or to have actually useful context sizes. Dense models can get sluggish though. AMD's ROCm support has been a little rough for Stable Diffusion stuff (memory issues leading to application stability problems) but it's worked well with LLMs, as does Vulkan.

I wish AMD would get around to adding NPU support in Linux for it though, it has more potential that could be unlocked.


What was wrong is that they had to foot the bill. Now Google does the hard and expensive part for them.


I meant what was wrong from the user's perspective who complained. Not from MS's perspective.


What's stopping a vim plugin from doing similar data exfiltration? Tons of people blindly install LazyVim, Spacevim, or other vim tooling and choose a bunch of similar things.


In general? Nothing, really.

I think it’s the culture behind the (neo)vim community is a bit more technical, and are quite quicker to sound the alarm if anyone tries something shady.

But, in any event, I hand-roll my own config and every plugin I install is inspected by me. When I pull changes, I check the diffs for anything shady. If a plugin is simple enough, I will just integrate it into my own stuff.


They either updated it or you quoted it wrong but the article says Devstral is open-weights now.


Yeah, they’ve updated it. Here’s the old version: https://web.archive.org/web/20260128034831mp_/https://allena...


Yes! We updated the blog, thanks for flagging the mistake.


> when there’s also Anthropic and Gemini offerings?

For average people the global competitors are putting up near identical services at 1/10th the cost. Anthropic and Google and OpenAI may have a corporate sales advantage about their security and domestic alignment, or being 5% better at some specific task but the populace at large isn't going to cough up $25 for that difference. Beyond the first month or two of the novelty phase it's not apparent the average person is willing to pay for AI services at all.


With "only" 32B active params, you don't necessarily need to. We're straying from common home users to serious enthusiasts and professionals but this seems like it would run ok on a workstation with a half terabyte of RAM and a single RTX6000.

But to answer your question directly, tensor parallelism. https://github.com/ggml-org/llama.cpp/discussions/8735 https://docs.vllm.ai/en/latest/configuration/conserving_memo...


You forgot "shit spewer". One who creates the shit that someone else has to then deal with (usually organizational peers or the team beneath them).


I've always called those folks seagulls - fly in, shit on everything, and take off again without having to deal with the consequences.


They are spending hundreds of billions of dollars on data centers filled with GPUs that cost more than an average car and then months on training models to serve your current $20/mo plan. Do you legitimately think there's a cheaper or free alternative that is of the same quality?

I guess you could technically run the huge leading open weight models using large disks as RAM and have close to the "same quality" but with "heat death of the universe" speeds.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: