More

libraryofbabel · 2026-02-26T06:10:47 1772086247

I have come to think “predict the next token” is not a useful way to explain how LLMs work to people unfamiliar with LLM training and internals. It’s technically correct, but at this point saying that and not talking about things like RLVR training and mechanistic interpretability is about as useful as framing talking with a person as “engaging with a human brain generating tokens” and ignoring psychology.

At least AI-haters don’t seem to be talking about “stochastic parrots” quite so much now. Maybe they finally got the memo.

qsera · 2026-02-26T06:24:36 1772087076

>“predict the next token” is not a useful way

That is the exact thing to say because that is exactly what it does, despite how it does so.

It is not useful to say it if you are an AI-shill though. You bought up AI-hater, so I think I am entitled to bring up AI-shills.

vasco · 2026-02-26T06:47:50 1772088470

My neurons are also just passing electric signals back and forward and exchanging water and salts with the rest of my body.

qsera · 2026-02-26T07:11:28 1772089888

> just passing electric signals back and forward

Ok, feel free to call yourselves a toaster, I don't mind!

vasco · 2026-02-26T08:12:37 1772093557

What, reductionism only works when you do it?

qsera · 2026-02-26T08:28:08 1772094488

I didn't

stephenr · 2026-02-26T11:23:56 1772105036

I mean that's really just a comparison to how silicon circuits work though isn't it.

"Thinking rocks" vs "thinking meat sacks" isn't much of a distinction really.

Conversely if you approach conversations the same way an LLM does and just repeat what you've heard other people say a lot without actually knowing what it means then you're also likely to be compared to a feathery chatterbox.

dylan604 · 2026-02-26T06:25:35 1772087135

I think talking to people unfamiliar with LLM training using words like "RLVR training and mechanistic interpretability" is about as useful as a grave robber in a crematorium.

libraryofbabel · 2026-02-26T06:53:18 1772088798

Obviously you don’t just say those words and leave it at that. Both those things can be explained in understandable terms. And even having a superficial sense of what they are gives people a better picture of what modern LLMs are all about than tired tropes from three years ago like “they’re just trained to predict the next token in the training data, therefore…”

goatlover · 2026-02-26T06:40:27 1772088027

Must one be an "AI-hater" to use the term "stochastic parrot"? Which is probably in response to all the emergent AGI claims and pointless discussions about LLMs being conscious.

stephenr · 2026-02-26T06:32:17 1772087537

> stochastic parrots

I prefer to use the term "spicy autocomplete" myself.

measurablefunc · 2026-02-26T06:36:34 1772087794

Sampling over a probability distribution is not as catchy as "stochastic parrot" but I have personally stopped telling believers that their imagined event horizon of transistor scale is not going to deliver them to their wished for automated utopia b/c one can not reason w/ people who did not reach their conclusions by reasoning.

Alex_L_Wood · 2026-02-26T07:07:27 1772089647

“Stochastic parrots” only stopped because AI fanboys stopped screaming “AGI” and “it will replace everyone”. Maybe they finally got the memo?

imiric · 2026-02-26T06:41:31 1772088091

Technical concepts can be broken down into ideas anyone can understand if they're interested. Token prediction is at the core of what these tools do, and is a good starting point for more complex topics.

On the other hand, calling these tools "intelligent", capable of "reasoning" and "thought", is not only more confusing and can never be simplified, but dishonest and borderline gaslighting.

libraryofbabel · 2026-02-22T19:53:32 1771790012

How nice for you. But since you totally neglected to say anything about your use-case or schema or query patterns, it’s impossible to know what this even means. Some use cases can trivially be done without any explicit transactions and you’re not giving anything up. For others (usually, something where you need to enforce invariants under high concurrency writes or writes+reads on the same data across multiple tables), transactions are pretty critical. So, it depends.

libraryofbabel · 2026-02-22T17:05:14 1771779914

I actually like this article a lot. I do a bit of teaching, and I imagined the ideal audience for this as a smart junior engineer who knows SQL and has encountered transactions but maybe doesn’t really understand them yet. I think introducing things via examples of isolation anomalies (which most engineers will have seen examples of in bugs, even if they didn’t fully understand them) gives the explanation a lot more concreteness than starting with serializability as a theoretical concept as GP is proposing. Sure, strict serializability is a powerful idea that ties all this together and is more satisfying for an expert who already knows this stuff. But for someone who is just learning, you have to motivate it first.

If anything, I’d say it might be better to start with the lower isolation levels first, highlight the concurrency problems that can arise with them, and gradually introduce higher isolation levels until you get to serializability. That feels a bit more intuitive rather than downward progression from serializability to read uncommitted as presented here.

It also might be nice to see a quick discussion of why people choose particular isolation levels in practice, e.g. why you might make a tradeoff under high concurrency and give up serializability to avoid waits and deadlocks.

But excellent article overall, and great visualizations.

libraryofbabel · 2026-02-10T01:00:05 1770685205

Is that actually true, if the charges differed at the 12th decimal place only? That’s non-obvious to me.

baggy_trough · 2026-02-10T04:41:53 1770698513

Yes because matter would have a residual charge that would massively overpower gravity even at that small a discrepancy.

PaulHoule · 2026-02-10T15:36:01 1770737761

To be devil's advocate maybe there is a surplus deficit of 1-part-in-10^12 in electrons relative to protons.

libraryofbabel · 2026-02-10T00:58:49 1770685129

There’s a name for that: the Anthropic principle. And it is deeply unsatisfying as an explanation.

And does it even apply here? If the charge on the electron differed from the charge on the proton at just the 12th decimal place, would that actually prevent complex life from forming. Citation needed for that one.

I agree with OP. The unexplained symmetry points to a deeper level.

squeefers · 2026-02-10T11:10:48 1770721848

> There’s a name for that: the Anthropic principle. And it is deeply unsatisfying as an explanation.

i feel the same about many worlds

krzat · 2026-02-10T08:05:21 1770710721

I find the anthropic principle fascinating.

I was born to this world at a certain point in time. I look around, and I see environment compatible with me: air, water, food, gravity, time, space. How deep does this go? Why I am not an ant or bacteria?

GordonS · 2026-02-10T09:11:39 1770714699

Presumably your parents weren't ants?

libraryofbabel · 2026-02-08T17:29:21 1770571761

Slightly tangential, but I discovered recently that the famous literary critic Harold Bloom was a huge fan of Ursula Le Guin and rated her one of the great canonical writers of the 20th century, in all of literature not just sci-fi. Also, they never met but they struck up a polite friendship over email when they were both old and chatted back and forth.

Some might consider this raises the stature of Ursula Le Guin. I consider it rather as raising the stature of Harold Bloom. He recognized how she transcended genre and belongs alongside (or perhaps, above) writers of highbrow literary fiction.

ilamont · 2026-02-08T18:41:52 1770576112

> He recognized how she transcended genre and belongs alongside (or perhaps, above) writers of highbrow literary fiction.

In the 70s and 80s, Le Guin and other SFF authors were very aware of the literary divide that often regarded most science fiction and fantasy as little better than pulp fiction. Gene Wolfe's essays and speeches in Castle of Days touch on this several times.

What changed was the arrival of a new generation of literary critics, researchers, and readers who knew greatness in some of the SFF works of the era.

libraryofbabel · 2026-02-07T20:39:39 1770496779

From a blog post last month by the same author:

> Today, I would say that about 90% of my code is authored by Claude Code. The rest of the time, I’m mostly touching up its work or doing routine tasks that it’s slow at, like refactoring or renaming.

> I see a lot of my fellow developers burying their heads in the sand, refusing to acknowledge the truth in front of their eyes, and it breaks my heart because a lot of us are scared, confused, or uncertain, and not enough of us are talking honestly about it. Maybe it’s because the initial tribal battle lines have clouded everybody’s judgment, or maybe it’s because we inhabit different worlds where the technology is either better or worse (I still don’t think LLMs are great at UI for example), but there’s just a lot of patently unhelpful discourse out there, and I’m tired of it.

https://nolanlawson.com/2026/01/24/ai-tribalism/

If you're responding to this with angry anti-AI rants (or wild AI hype), might want to go read that post.

libraryofbabel · 2026-02-07T20:27:09 1770496029

So many people responding to you with snarky comments or questioning your programming ability. It makes me sad. You shared a personal take (in response to TFA which was also a personal take). There is so much hostility and pessimism directed at engineers who simply say that AI makes them more productive and allows them to accomplish their goals faster.

To the skeptics: by all means, don't use AI if you don't want to; it's your choice, your career, your life. But I am not sure that hitching your identity to hating AI is altogether a good idea. It will make you increasingly bitter as these tools improve further and our industry and the wider world slowly shifts to incorporate them.

Frankly, I consider the mourning of The Craft of Software to be just a little myopic. If there are things to worry about with AI they are bigger things, like widespread shifts in the labor force and economic disruption 10 or 20 years from now, or even the consequences of the current investment bubble popping. And there are bigger potential gains in view as well. I want AI to help us advance the frontiers of science and help us get to cures for more diseases and ameliorate human suffering. If a particular way of working in a particular late-20th and early-21st century profession that I happen to be in goes away but we get to those things, so be it. I enjoy coding. I still do it without AI sometimes. It's a pleasant activity to be good at. But I don't kid myself that my feelings about it are all that important in the grand scheme of things.

libraryofbabel · 2026-02-07T18:13:57 1770488037

I do not really agree with the below, but the logic is probably:

1) Engineering investment at companies generally pays off in multiples of what is spent on engineering time. Say you pay 10 engineers $200k / year each and the features those 10 engineers build grow yearly revenue by $10M. That’s a 4x ROI and clearly a good deal. (Of course, this only applies up to some ceiling; not every company has enough TAM to grow as big as Amazon).

2) Giving engineers near-unlimited access to token usage means they can create even more features, in a way that still produces positive ROI per token. This is the part I disagree with most. It’s complicated. You cannot just ship infinite slop and make money. It glosses over massive complexity in how software is delivered and used.

3) Therefore (so the argument goes) you should not cap tokens and should encourage engineers to use as many as possible.

Like I said, I don’t agree with this argument. But the key thing here is step 1. Engineering time is an investment to grow revenue. If you really could get positive ROI per token in revenue growth, you should buy infinite tokens until you hit the ceiling of your business.

Of course, the real world does not work like this.

camdenreslink · 2026-02-08T03:39:02 1770521942

Is the time it takes for an engineer to implement PRs the bottleneck in generating revenue for a software product?

In my experience it takes humans to know what to build to generate revenue, and most of the time building that product is not spent coding at all. Coding is like the last step. Spending $1k/day in tokens only makes sense if you know exactly what to build already to generate this revenue. Otherwise you are building what exactly? Is the LLM also doing the job of the business side of the house to decide what to build?

rune-dev · 2026-02-07T18:17:43 1770488263

Right, I understand of course that AI usage and token costs are an investment (probably even a very good one!).

But my point is moreso that saying 1k a day is cheap is ridiculous. Even for a company that expects an ROI on that investment. There’s risks involved and as you said, diminishing returns on software output.

I find AI bros view of the economics of AI usage strange. It’s reasonable to me to say you think its a good investment, but to say it’s cheap is a whole different thing.

libraryofbabel · 2026-02-07T18:54:56 1770490496

Oh sure. We agree on all you said. I wouldn’t call it cheap either. :)

The best you can say is “high cost but positive ROI investment.” Although I don’t think that’s true beyond a certain point either, certainly not outside special cases like small startups with a lot of funding trying to build a product quickly. You can’t just spew tokens about and expect revenue to increase.

That said, I do reserve some special scorn for companies that penny-pinch on AI tooling. Any CTO or CEO who thinks a $200/month Claude Max subscription (or equivalent) for each developer is too much money to spent really needs to rethink their whole model of software ROI and costs. You’re often paying your devs >$100k yr and you won’t pay $2k / yr to make them more productive? I understand there are budget and planning cycle constraints blah blah, but… really?!

libraryofbabel · 2026-02-07T15:07:39 1770476859

It’s a cute aphorism, and useful sometimes, but when you look closely or from different angles hedgehogs often have foxy attributes, or vice versa. Some moments in a person’s life they look more like a fox, other times like a hedgehog. Perhaps the distinction applies better to specific ideas or to pieces of work than as a fixed essence of a person.

Einstein might seem like a quintessential hedgehog (surely the principle of relativity is a Big Hedgehog Idea if ever there was one). Then you learn he once invented a refrigerator. Tolstoy looks like an obvious fox earlier in his writing career, but increasingly a hedgehog towards the end of his life. And slightly less exaltedly, I feel like a fox in some contexts and a hedgehog in others. It might change day to day, or depend on who I’m talking too.

(People are complicated. All aphorisms are wrong, but some are useful I guess. I still quote this one sometimes.)