Hacker Newsnew | past | comments | ask | show | jobs | submit | jonas21's commentslogin

> At this point it seems pretty clear that LLMs are having a major impact on how software is built, but for almost every other industry the practical effects are mostly incremental.

Even just a year ago, most people thought the practical effects in software engineering were incremental too. It took another generation of models and tooling to get to the point where it could start having a large impact.

What makes you think the same will not happen in other knowledge-based fields after another iteration or two?


> most people thought the practical effects in software engineering were incremental too

Hum... Are you saying it's having clear positive (never mind "transformative") impact somewhere? Can you point any place we can see observable clear positive impact?


It doesn’t need to provide “ observable clear positive impact”. As long as the bosses think it improves numbers, it will be used. See offshoring or advertising everywhere.

I know many companies that have replaced Customer Support agents with LLM-based agents. Replacing support with AI isn't new, but what is new is that the LLM-based ones have higher CSAT (customer satisfaction) rates than the humans they are now replacing (ie, it's not just cost anymore... It's cost and quality).

Well I as a Customer who had to deal with AI bots as Customer Service have significantly lower Customer Satisfaction. Because I don't wanna deal with some clanker. Who doesn't realy understand what I am talking about.

Software is more amenable to LLMs because there is a rich source of highly relevant training data that corresponds directly to the building blocks of software, and the "correctness" of software is quasi-self-verifiable. This isn't true for pretty much anything else.

The more verifiable the domain the better suited. We see similar reports of benefits from advanced mathematics research from Terrence Tao, granted some reports seem to amount to very few knew some data existed that was relevant to the proof, but the LLM had it in its training corpus. Still, verifiably correct domains are well-suited.

So the concept formal verification is as relevant as ever, and when building interconnected programs the complexity rises and verifiability becomes more difficult.


> The more verifiable the domain the better suited.

Absolutely. It's also worth noting that in the case of Tao's work, the LLM was producing Lean and Python code.


I think the solution in harder-to-verify cases is to provide AI (sub-)agents a really good set of instructions on a detailed set of guidelines of what it should do and in what ways it should think and explore and break down problems. Potentially tens of thousands of words of instructions to get the LLM to act as a competent employee in the field. Then the models need to be good enough at instruction-following to actually explore the problem in the right way and apply basic intelligence to solve it. Basically treating the LLM as a competent general knowledge worker that is unfamiliar with the specific field, and giving it detailed instructions on how to succeed in this field.

For the easy-to-verify fields like coding, you can train "domain intuitions" directly to the LLM (and some of this training should generalize to other knowledge work abilities), but for other fields you would need to supply them in the prompt as the abilities cannot be trained into the LLM directly. This will need better models but might become doable in a few generations.


> I think the solution in harder-to-verify cases is to provide AI (sub-)agents a really good set of instructions on a detailed set of guidelines of what it should do and in what ways it should think and explore and break down problems

Using LLMs to validate LLMs isn't a solution to this problem. If the system can't self-verify then there's no signal to tell the LLM that it's wrong. The LLM is fundamentally unreliable, that's why you need a self-verifying system to guide and constrain the token generation.


Presumably at some point capability will translate to other domains even if the exchange rate is poor. If it can autonomously write software and author CAD files then it can autonomously design robots. I assume everything else follows naturally from that.

> If it can autonomously write software and author CAD files then it can autonomously design robots.

It can't because the LLM can't test its own design. Unlike with code, the LLM can't incrementally crawl its way to a solution guided by unit tests and error messages. In the real world, there are material costs for trial and error, and there is no CLI that allows every aspect of the universe to be directly manipulated with perfect precision.


You don't need perfect precision, just a sufficiently high fidelity simulation. For example hypersonic weapon design being carried out computationally was the historical reason (pre AI) to restrict export of certain electronics to China.

OpenAI demoed training a model for a robotic hand using this approach years ago.


I have no idea what you are even asking Claude to do here.

I was asking it to see if the wikisource tools are working by looking up a Bible quote. There was no ambiguity about the task itself; what I'm saying is that Claude 'knows' a bunch of things (the Bible has different translations) that it doesn't operationalize when doing a task--issues that would would be glaringly obvious to a human who knows the same things

Maybe I'm missing the point as well, but what did it do wrong?

It seemed like you wanted to see if a search tool was working.

It looked to see. It tried one search using on data source KJ and found no matches. Next question would be is the quote not in there, is there a mis-remembering of the quote or is their something wrong with the data source. It tries an easier to match quote and finds nothing, which it finds odd. So next step in debugging is assume a hypotheses of KJ Bible datasource is broken, corrupted or incomplete (or not working for some other reason). So it searches for an easier quote using a different datasource.

It's unclear the next bit because it looks like you may have interrupted it, but it seems like it found the passage about Mary in the DR data source. So using elimination, it now knows the tool works (it can find things), the DR data source works (it can also find things), so back to the last question of eliminating hypotheses: is the quote wrong foe the KJ datasource, or is that datasource broken.

The next (and maybe last query I would do, and what it chose) was search for something guaranteed to be there in KJ version: the phrase 'Mary'. Then scan through the results to find the quote you want, then re-query using the exact quote you know is there. You get 3 options.

If it can't find "Mary" at all in KJ dataset then datasource is likely broken. If it finds mary, but results don't contain the phrase, then the datasource is incomplete. If it contains the phrase then search for it, if it doesn't find it then you've narrowed down the issue "phase based search seems to fail". If it does find and, and it's the exact quote it searched for originally then you know search has an intermittent bug.

This seemed like perfect debugging to me - am I missing something here?

And it even summarized at the end how it could've debugged this process faster. Don't waste a few queries up front trying to pin down the exact quote. Search for "Mary" get a quote that is in there, then search for that quote.

This seems perfectly on target. It's possible I'm missing something though. What were you looking for it to do?


What I was expecting is that it would pull up the KJV using the results returned from the wiki_source_search tool instead of going for a totally different translation and then doing a text match for a KJV quote

> I was expecting is that it would pull up the KJV using the results returned from the wiki_source_search

Did you tell it to do that?


If you click through to the study that the Guardian based this article on [1], it looks like it was done by an SEO firm, by a Content Marketing Manager. Kind of ironic, given that it's about the quality of cited sources.

[1] https://seranking.com/blog/health-ai-overviews-youtube-vs-me...


It does have these details for the current generation hardware. And if you want more, click on the link at the top:

https://hackernoon.com/the-long-now-of-the-web-inside-the-in...


Yeah, this is just blogspam. Some guy re-hashing the Hackernoon article, interspersed with his own comments.

I wouldn't be surprised if it's AI.

It's time to come up with a term for blog posts that are just AI-augmented re-hashes of other people's writing.

Maybe blogslop.


That pattern shows up when publishing has near-zero cost and review has no gate. The fix is procedural: define what counts as original contribution and require a quick verification pass before posting. Without an input filter and a stop rule, you get infinite rephrases that drown out the scarce primary work.

You and I must be different kinds of readers.

I’m under the impression that this style of writing is what people wish they got when they asked AI to summarize a lengthy web page. It’s criticism and commentary. I can’t see how you missed out on the passages that add to and even correct or argue against statements made in the Hackernoon article.

In a way I can’t tell how one can believe that “re-hashing [an article], interspersed with [the blogger’s] own comments” isn’t a common blogging practice. If not then the internet made a mistake by allowing the likes of John Gruber to earn a living this way.

And trust that I enjoy a good knee-jerk “slop” charge myself. To me this doesn’t qualify a bit.


What a slog post.

This comment has, I think, made me more sad than anything I've ever read on HN before. David is one of the most thoughtful, critical, and valuable voices on the topic of digital archival, and has been for quite some time. The idea of someone dismissing his review of a much more slop-adjacent article as such is incredibly depressing.

There's also an entire section on "what constitutes genuine helpfulness"

Fair cop, I completely missed that!!

AI capex may or may not flatten in the near future (and I don't necessarily see a reason why it would). But smartphone capex already has.

Like smartphones, AI chips also have a replacement cycle. AI chips depreciate quickly -- not because the old ones go bad, but because the new ones are so much better in performance and efficiency than the previous generation. While smartphones aren't making huge leaps every year like they used to, AI chips still are -- meaning there's a stronger incentive to upgrade every cycle for these chips than smartphone processors.


> AI chips depreciate quickly -- not because the old ones go bad

I've heard that it's exactly that, reports of them burning out every 2-3 years. Haven't seen any hard numbers though.


Lifetime curve is something they can control. If they can predict replacement rate, makes sense to make chips go bad on the same schedule, saving on manufacturing costs.

Operating a car (i.e. driving) is certainly not deterministic. Even if you take the same route over and over, you never know exactly what other drivers or pedestrians are going to do, or whether there will be unexpected road conditions, construction, inclement weather, etc. But through experience, you build up intuition and rules of thumb that allow you to drive safely, even in the face of uncertainty.

It's the same programming with LLMs. Through experience, you build up intuition and rules of thumb that allow you to get good results, even if you don't get exactly the same result every time.


> It's the same programming with LLMs. Through experience, you build up intuition and rules of thumb that allow you to get good results, even if you don't get exactly the same result every time.

Friend, you have literally described a nondeterministic system. LLM output is nondeterministic. Identical input conditions result in variable output conditions. Even if those variable output conditions cluster around similar ideas or methods, they are not identical.


The problem is that this is completely false. LLMs are actually deterministic. There are a lot more input parameters than just the prompt. If you're using a piece of shit corpo cloud model, you're locked out of managing your inputs because of UX or whatever.


Ah, we've hit the rock bottom of arguments: there's some unspecified ideal LLM model that is 100% deterministic that will definitely 100% do the same thing every time.


We've hit rock bottom of rebuttals, where not only is domain knowledge completely vacant, but you can't even be bothered to read and comprehend what you're replying to. There is no non-deterministic LLM. Period. You're already starting off from an incoherent position.

Now, if you'd like to stop acting like a smug ass and be inquisitive as per the commenting guidelines, I'd be happy to tell you more. But really, if you actually comprehended the post you're replying to, there would be no need since it contains the piece of the puzzle you aren't quite grasping.


> There is no non-deterministic LLM.

Strange then that the vast majority of LLMs that people use produce non-deterministic output.

Funnily enough I had literally the same argument with someone a few months back in a friends group. I ran the "non-shitty non-corpo completely determenistic model" through ollama... And immediately got two different answers for the same input.

> Now, if you'd like to stop acting like a smug ass and be inquisitive as per the commenting guidelines,

Ah. Commenting guidelines. The ones that tell you not to post vague allusions to something, not to be dismissive of what others are saying, responding to the strongest plausible interpretation of someone says etc.? Those ones?


> Strange then that the vast majority of LLMs that people use produce non-deterministic output.

> I ran the "non-shitty non-corpo completely determenistic model" through ollama... And immediately got two different answers for the same input.

With deterministic hardware in the same configuration, using the same binaries, providing the same seed, the same input sequence to the same model weights will produce bit-identical outputs. Where you can get into trouble is if you aren't actually specifying your seed, or with non-deterministic hardware in varying configurations, or if your OS mixes entropy with the standard pRNG mechanisms.

Inference is otherwise fundamentally deterministic. In implementation, certain things like thread-scheduling and floating-point math can be contingent on the entire machine state as an input itself. Since replicating that input can be very hard on some systems, you can effectively get rid of it like so:

    ollama run [whatever] --seed 123 --temperature 0 --num-thread 1
A note that "--temperature 0" may not strictly be necessary. Depending on your system, setting the seed and restricting to a single thread will be sufficient.

These flags don't magically change LLM formalisms. You can read more about how floating point operations produce non-determinism here:

https://arxiv.org/abs/2511.17826

In this context, forcing single-threading bypasses FP-hardware's non-associativity issues that crop up with multi-threaded reduction. If you still don't have bit-replicated outputs for the same input sequence, either something is seriously wrong with your computer or you should get in touch with a reputable metatheoretician because you've just discovered something very significant.

> Those ones?

Yes those ones. Perhaps in the future you can learn from this experience and start with a post like the first part of this, rather than a condescending non-sequitur, and you'll find it's a more constructive way to engage with others. That's why the guidelines exist, after all.


> These flags don't magically change LLM formalisms. You can read more about how floating point operations produce non-determinism here:

Basically what you're saying is "for 99.9% of use cases and how people use them they are non-deterministic, and you have to very carefully work around that non-determinism to the point of having workarounds for your GPU and making them even more unusable"

> In this context, forcing single-threading bypasses FP-hardware's non-associativity issues that crop up with multi-threaded reduction.

Translation: yup, they are non-deterministic under normal conditions. Which the paper explicitly states:

--- start quote ---

existing LLM serving frameworks exhibit non-deterministic behavior: identical inputs can yield different outputs when system configurations (e.g., tensor parallel (TP) size, batch size) vary, even under greedy decoding. This arises from the non-associativity of floating-point arithmetic and inconsistent reduction orders across GPUs.

--- end quote ---

> If you still don't have bit-replicated outputs for the same input sequence, either something is seriously wrong with your computer or you should get in touch with a reputable metatheoretician because you've just discovered something very significant.

Basically what you're saying is: If you do all of the following, then the output will be deterministic:

- workaround for GPUs with num_thread 1

- temperature set to 0

- top_k to 0

- top_p to 0

- context window to 0 (or always do a single run from a new session)

Then the output will be the same all the time. Otherwise even "non-shitty corp runners" or whatever will keep giving different answers for the same question: https://gist.github.com/dmitriid/5eb0848c6b274bd8c5eb12e6633...

Edit. So what we should be saying is that "LLM models as they are normally used are very/completely non-deterministic".

> Perhaps in the future you can learn from this experience and start with a post like the first part of this

So why didn't you?


> The problem is that this is completely false. LLMs are actually deterministic. There are a lot more input parameters than just the prompt. If you're using a piece of shit corpo cloud model, you're locked out of managing your inputs because of UX or whatever.

When you decide to make up your own definition of determinism, you can win any argument. Good job.


Yes, that's my point. Neither driving nor coding with an LLM is perfectly deterministic. You have to learn to deal with different things happening if you want do do either successfully.


> Neither driving nor coding with an LLM is perfectly deterministic.

Funny.

When driving, I can safely assume that when I turn the steering wheel in the direction in turns. That the road that was there yesterday is there today (barring certain emergencies, that's why they are emergencies). That the red light in a traffic light means stop, and the green means go.

And not the equivalent "oh, you're completely right, I forgot to include the wheels, wired the steering wheel incorrectly, and completely messed up the colors"


> Operating a car (i.e. driving) is certainly not deterministic.

Yes. Operating a car or a table saw is deterministic. If you turn your steering wheel left, the car will turn left every time with very few exceptions that can also be explained deterministically (e.g. hardware fault or ice on road).

Operating LLMs is completly non-deterministic.


> Operating LLMs is completly non-deterministic.

Claiming "completely" is mapping a boolean to a float.

If you tell an LLM (with tools) to do a web search, it usually does a web search. The biggest issue right now is more at the scale of: if you tell it to create turn-by-turn directions to navigate across a city, it might create a python script that does this perfectly with OpenStreetMap data, or it may attempt to use its own intuition and get lost in a cul-de-sac.


Wow. It can do a web search. And that is useful in the context of programming how? Or in any context?

The question is about the result of an action. Given the same problem statement in the same codebase it will produce wildly different results even if prompted two times in a row.

Even for trivial tasks the output may vary between just a simple fix, and a rewrite of half of the codebase. You can never predict or replicate the output.

To quote Douglas Adams, "The ships hung in the sky in much the same way that bricks don't". Cars and table saws operate in much the same way that LLMs don't.


> Wow. It can do a web search. And that is useful in the context of programming how? Or in any context?

Your own example was turning a steering wheel.

A web search is as relevant to the broader problems LLMs are good at, as steering wheels are to cars.

> Given the same problem statement in the same codebase it will produce wildly different results even if prompted two times in a row.

Do you always drive the same route, every day, without alteration?

Does it matter?

> You can never predict or replicate the output.

Sure you can. It's just less like predicting what a calculator will show and more like predicting if, when playing catch, the other player will catch your throw.

You can learn how to deal with reality even when randomness is present, and in fact this is something we're better at than the machines.


> Your own example was turning a steering wheel.

The original example was trying to compare LLMs to cars and table saws.

> Do you always drive the same route, every day, without alteration?

I'm not the one comparing operating machinery (cars, table saws) to LLMs. Again. If I turn a steering wheel in a car, the car turns. If input the same prompt into an LLM, it will produce different results at different times.

Lol. Even "driving a route" is probably 99% deterministic unlike LLMs. If I follow a sign saying "turn left", I will not end up in a "You are absolutely right, there shouldn't be a cliff at this location" situation.

Edit: and when signs end pointing to a cliff, or when a child runs onto the roads in front of you, these are called emergency situations. Whereas emergency situations are the only available modus operandi for an LLM, and actually following instructions is a lucky happenstance.

> It's just less like predicting what a calculator will show and more like predicting if, when playing catch, the other player will catch your throw

If you think that throwing more and more bad comparisons that don't work into the conversation somehow proves your point, let me dissuade you of that notion: it doesn't.


There are plenty of non-blockchain, non-NFT, non-online gambling, non-adtech, non-facist software jobs. In fact, the vast majority of software jobs are. You can refuse to work with all of these things and not even notice a meaningful difference in career opportunities.

If you refuse to work with AI, however, you're already significantly limiting your opportunities. And at the pace things are going, you're probably going to find yourself constrained to a small niche sooner rather than later.


If your argument is that there are more jobs that require morally dubious developments (stealing people's IP without licensing it, etc.) than jobs that don't, I don't think that's news.

There's always more shady jobs than ethically satisfying ones. There's increasingly more jobs in prediction markets and other sorts of gambling, adtech (Meta, Google). Moral compromise pays.

But if you really think about it and set limits on what is acceptable for you to work on (interesting new challenges, no morally dubious developments like stealing IP for ML training, etc.) then you simply don't have that FOMO of "I am sacrificing my career" when you screen those jobs out. Those jobs just don't exist for you.

Also, people who tag everybody like that as some sort of "anti-AI" tinfoilhatters are making a straw man argument. Most people with an informed opinion don't like the ways this tech is applied and rolled out in ways that is unsustainable and exploitative of ordinary people and open-source ecosystem, the confused hype around it, circular investment, etc., not the underlying tech on its own. Being vocally against these matters does not make one an unemployable pariah in the slightest, especially considering most jobs these days build on open source and being anti license-violating LLMs is being pro sustainable open-source.


> There's always more shady jobs than ethically satisfying ones. There's increasingly more jobs in prediction markets and other sorts of gambling, adtech (Meta, Google). Moral compromise pays.

I would say, this is not about the final product, but a way of creating a product. Akin to writing your code on TextPad vs. using VSCode. Imo, having a moral stance on AI-generated art is valid, but AI-generated code isn't, just because I don't consider "code" "art".

I've been doing it for about 20 or so years at this point, throughout literally every stage of my life. Personally, I'd judge a person who is using AI to copy someone's art, but if someone is using AI to generate code gets a pass from me. That being said, a person who considers code as "art" (I have friends like that, so I definitely get the argument!), would not agree with me.

> Most people with an informed opinion don't like the ways this tech is applied

Yeah, I'm not sure if this tracks? I don't think LLMs are good/proficient as a tool for very specialized or ultra-hard tasks, however for any boilerplate-coding-task-and-all-CRUD-stuff, it would speed up any senior engineer in task completion.


> I would say, this is not about the final product, but a way of creating a product.

It is the same logic as not wanting to use some blockchain/crypto-related platform to get paid. If you believe it is mostly used for crime, you don't want to use it to get paid to avoid legitimizing a bad thing. Even if there's no doubt you will get paid, the end result is the same, but you know you would be creating a side effect.

If some way of creating a product supports something bad (and simply using any LLM always entails helping train it and benefit the company running it), I can choose another way.


> There's always more shady jobs

That is because your views appear to align with staunch progressives. From rejecting conservative politics ("fascism"), AI, advertising, and gambling.

From my side the only thing I would be hesitant about is gambling. The rest is arguably not objectively bad but more personal or political opinion from your side.


There seems to be some confusion. I wouldn't call conservative politics as a whole fascist, that's your choice of words. I doubt that "anti-AI progressive" is a thing too.

> The rest is arguably not objectively bad but more personal or political opinion from your side.

Nothing is objectively bad. Plenty of people argue that gambling should be legal if anything on the basis of personal freedom. All of this is a matter of personal choice.

(Incidentally, while you are putting people in buckets like that, note that one person very much can be similtaneously against gambling and drug legalization and be pro personal freedom open-source libertarian maximalist. Things are much more nuanced than “progressive” vs. “conservative”, whatever you put in those buckets is on you.)


That's fair enough.

It is just from my experience that political discussions online are very partisan. "fascism" in relation to the current US government combined with anti-AI sentiment is almost always a sure indicator for a certain bucket of politics.

Maybe I am spending too much time on Reddit.


To play devil's advocate: all the people using AI are not being significantly more productive on brownfield applications. If GP manages to find a Big Co (tech or non tech) which doesn't precisely bother about AI usage and just delivering features, and the bottleneck is not software dev (as is the case in majority of old school companies), he/she would be fine.


If your bottleneck is not typing speed, you'll be fine.


Yeah, it's apparently pulling in over $800K in annual revenue [1].

EDIT: Doing the math on the sponsor list, it's probably around $1M in ARR now.

[1] https://petersuhm.com/posts/2025/


I'm sorry but it simply does not cost a million dollars to maintain Tailwind, a CSS library that has no compelling reason to ever change at this point.


> Electricity prices near data centers go up, right?

I hear this a lot, but the most comprehensive study I've seen found the opposite -- that retail electricity prices tend to decrease as load (from datacenters and other consumers) increases [1].

The places where electricity prices have increased the most since 2019 (California, Hawaii, and the Northeast) are not places where they're building a lot of new datacenters.

[1] https://www.sciencedirect.com/science/article/pii/S104061902...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: