Hacker Newsnew | past | comments | ask | show | jobs | submit | abathur's commentslogin

I think both of those experiments do a good job of demonstrating utility on a certain kind of task.

But this is cherry-picking.

In the grand scheme of the work we all collectively do, very few programming projects entail something even vaguely like generating an Nth HTML parser in a language that already has several wildly popular HTML parsers--or porting that parser into another language that has several wildly popular HTML parsers.

Even fewer tasks come with a library of 9k+ tests to sharpen our solutions against. (Which itself wouldn't exist without experts trodding this ground thoroughly enough to accrue them.)

The experiments are incredibly interesting and illuminating, but I feel like it's verging on gaslighting to frame them as proof of how useful the technology is when it's hard to imagine a more favorable situation.


> "it's hard to imagine a more favorable situation"

Granted, but this reads a bit like a headline from The Onion: "'Hard to imagine a more favourable situation than pressing nails into wood' said local man unimpressed with neighbour's new hammer".

I think it's a strong enough example to disprove "they're an interesting phenomenon that people have convinced themselves MUST BE USEFUL ... either through ignorance or a sense of desperation". Not enough to claim they are always useful in all situations or to all people, but I wasn't trying for that. You (or the person I was replying to) basically have to make the case that Simon Willison is ignorant about LLMs and programming, is desperate about something, or is deluding himself that the port worked when it actually didn't, to keep the original claim. And I don't think you can. He isn't hyping an AI startup, he has no profit motive to delude him. He isn't a non-technical business leader who can't code being baffled by buzzwords. He isn't new to LLMs and wowed by the first thing. He gave a conference talk showing that LLMs cannot draw pelicans on bicycles so he is able to admit their flaws and limitations.

> "But this is cherry-picking."

Is it? I can't use an example where they weren't useful or failed. It makes no sense to try and argue how many successes vs. failures, even if I had any way to know that; any number of people failing at plumbing a bathroom sink don't prove that plumbing is impossible or not useful. One success at plumbing a bathroom sink is enough to demonstrate that it is possible and useful - it doesn't need dozens of examples - even if the task is narrowly scoped and well-trodden. If a Tesla humanoid robot could plumb in a bathroom sink, it might not be good value for money, but it would be a useful task. If it could do it for $30 it might be good value for money as well even if it couldn't do any other tasks at all, right?


> Granted, but this reads a bit like a headline from The Onion: "'Hard to imagine a more favourable situation than pressing nails into wood' said local man unimpressed with neighbour's new hammer".

Chuffed you picked this example to ~sneer about.

There's a near-infinite list of problems one can solve with a hammer, but there are vanishingly few things one can build with just a hammer.

> You (or the person I was replying to) basically have to make the case that Simon Willison is ignorant about LLMs and programming, is desperate about something, or is deluding himself that the port worked when it actually didn't, to keep the original claim.

I don't have to do any such thing.

I said the experiments were both interesting and illuminating and I meant it. But that doesn't mean they will generalize to less-favorable problems. (Simon's doing great work to help stake out what does and doesn't work for him. I have seen every single one of the posts you're alluding to as they were posted, and I hesitated to reply here because I was leery someone would try to frame it as an attack on him or his work.)

> Is it? I can't use an example where they weren't useful or failed.

  https://en.wiktionary.org/wiki/cherry-pick

  (idiomatic) To pick out the best or most desirable items
  from a list or group, especially to obtain some advantage
  or to present something in the best possible light. 

  (rhetoric, logic, by extension) To select only evidence which supports an argument, 
  and reject or ignore contradictory evidence. 
> any number of people failing at plumbing a bathroom sink don't prove that plumbing is impossible or not useful. One success at plumbing a bathroom sink is enough to demonstrate that it is possible and useful - it doesn't need dozens of examples - even if the task is narrowly scoped and well-trodden.

This smells like sleight of hand.

I'm happy to grant this (with a caveat^) if your point is that this success proves LLMs can build an HTML parser in a language with several popular source-available examples and thousands of tests (and probably many near-identical copies of the underlying HTML specs as they evolve) with months of human guidance^ and (with much less guidance) rapidly translate that parser into another language with many popular source-available answers and the same test suite. Yes--sure--one example of each is proof they can do both tasks.

But I take your GP to be suggesting something more like: this success at plumbing a sink inside the framework an existing house with plumbing provides is proof that these things can (or will) build average fully-plumbed houses.

^Simon, who you noted is not ignorant about LLMs and programming, was clear that the initial task of getting an LLM to write the first codebase that passed this test suite took Emil months of work.

> If a Tesla humanoid robot could plumb in a bathroom sink, it might not be good value for money, but it would be a useful task. If it could do it for $30 it might be good value for money as well even if it couldn't do any other tasks at all, right?

The only part of this that appears to have been done for about $30 was the translation of the existing codebase. I wouldn't argue that accomplishing this task for $30 isn't impressive.

But, again, this smells like sleight of hand.

We have probably plumbed billions of sinks (and hopefully have billions or even trillions more to go), so any automation that can do one for $30 has clear value.

A world with a billion well-tested HTML parsers in need of translation is likely one kind of hell or another. Proof an LLM-based workflow can translate a well-tested HTML parser for $30 is interesting and illuminating (I'm particularly interested in whether it'll upend how hard some of us have to fight to justify the time and effort that goes into high-quality test suites), but translating them obviously isn't going to pay the bills by itself.

(If the success doesn't generalize to less favorable situations that do pay the bills, this clearly valuable capability may be repriced to better reflect how much labor and risk it saves relative to a human rewrite.)


> "Yes--sure--one example of each is proof they can do both tasks."

Therefore LLMs are useful. Q.E.D. The claim "people who say LLMs are useful are deluded" is refuted. Readers can stop here, there is no disagreement to argue about.

> "But I take your GP to be suggesting something more like: this success at plumbing a sink inside the framework an existing house with plumbing provides is proof that these things can (or will) build average fully-plumbed houses."

Not exactly; it's common to see people dismiss internet claims of LLMs being useful. Here[1] is a specific dismissal that I am thinking of where various people are claiming that LLMs are useful and the HN commenter investigated and says the LLMs are useless, the people are incompetent, and others are hand-writing a lot of the code. No data is provided for use the readers to make any judgement one way or the other. Emil taking months to create the Python version could be dismissed this way as well, assuming a lot of hand-writing of code in that time. Small scripts can be dismissed with "I could have written that quickly" or "it's basically regurgitating from StackOverflow".

Simon Willison's experiment is a more concrete example. The task is clearly specified, not vague architecture design. The task has a clear success condition (the tests). It's clear how big the task is and it's not a tiny trivial toy. It's clear how long the whole project took and how long GPT ran for, there isn't a lot of human work hiding in it. It ran for multiple hours generating a non-trivial amount of work/code which is not likely to be a literal example regurgitated from its training data. The author is known (Django, Datasette) to be a competent programmer. The LLM code can be clearly separated from any human involvement.

Where my GP was going is that the experiment is not just another vague anecdote, it's specific enough that there's no room left for dismissing it how the commenter in [1] does. It's untenable to hold the view that "LLMs are useless" in light of this example.

> (repeat) "But I take your GP to be suggesting something more like: this success at plumbing a sink inside the framework an existing house with plumbing provides is proof that these things can (or will) build average fully-plumbed houses."

The example is not proof that these things can do anything else, but why would you assume they can't do tasks of similar complexity? Through time we've gone from "LLMs don't exist" to "LLMs exist as novelties and toys (GPT-1 2018)" to "LLMs might be useful but might not be". If things keep progressing we will get to "LLMs are useful". I am taking the position that we are past that point, and I am arguing that position. We are definitely into the time "they are useful". Other people have believed that for a long time. Not just useful for that task, but for tasks of that kind of complexity.

Sometime between GPT-1 babbling (2018) and today (Q4 2025) the GPTs and the tooling improved from not being able to do this task to yes being able to do this task. Some refinement, some chain of thought, some enlarged context, some API features, some CLI tools.

Since one can't argue that LLMs are useless by giving a single example of a failure, to hold the view that LLMs are useless, one would need to broadly dismiss whole classes of examples by the techniques in [1]. This specific example can't be dismissed in those ways.

> "If the success doesn't generalize to less favorable situations that do pay the bills"

Most bill-paying code in the world is CRUD, web front end, business logic, not intricate parsing and computer science fundamentals. I'm expecting that "AI slop" is going to be good enough for managers no matter how objectionable programmers find it. If I order something online and it arrives, I don't care if the order form was Ruby on Rails emailing someone who copied the order docs into a Google Spreadsheet using an AI generated If This Then That workflow. and as long as the error rate and credit card chargeback rate are low enough, nor will the company owners. Even though there are tons of examples of companies having very poor systems and still being in business, I don't have any specific examples so I wouldn't argue this vehemently - but the world isn't waiting for LLMs to be as 'useful' as HN commenters are waiting for, before throwing spaghetti at the wall and letting 'Darwinian Natural Selection' find the maximum level of slop the markets will tolerate.

----

On that note, a pedantic bit about cherry-picking: there's a difference between cherry-picking as a thing, and cherry-picking as a logical fallacy / bad-faith argument. e.g. if someone claims "Plants are inedible" and I point to cabbage and say it proves the claim is false, you say I'm cherry-picking cabbage and ignoring poisonous foxgloves. However, foxgloves existing - and a thousand other inedible plants existing - does not make edible cabbage stop existing. Seeing the ignored examples does not change the conclusion "plants are inedible" is false, so ignoring those things was not bad. Similarly "I asked GPT5 to port the Linux kernel to Rust and it failed" does not invalidate the html5 parser port.

Definition 2 is bad form; e.g. saying "smoking is good for you, here is a study which proves it" is a cherry-picking fallacy because if the ignored-studies were seen, they would counter the claim "smoking is good for you". Hiding them is part of the argument, deceptively.

"LLMs are useless and only a deluded person would say otherwise" is an example of the former; it's countered by a single example of a non-deluded person showing an LLM doing something useful. It isn't a cherry-picking fallacy to pick one example because no amount of "I asked ChatGPT to port Linux to Rust and it failed" makes the HTML parser stop existing and doesn't change the conclusion.

[1] https://news.ycombinator.com/item?id=45560885


I've been tasked with doing a very superficial review of a codebase produced by an adult who purports to have decades of database/backend experience with the assistance of a well-known agent.

While skimming tests for the python backend, I spotted the following:

    @patch.dict(os.environ, {"ENVIRONMENT": "production"})
    def test_settings_environment_from_env(self) -> None:
        """Test environment setting from env var."""
        from importlib import reload

        import app.config

        reload(app.config)

        # Settings should use env var
        assert os.environ.get("ENVIRONMENT") == "production"
This isn't an outlier. There are smells everywhere.


If it is so obvious to you that there is a smell here then an agent would have caught it. Try it yourself.



This is the kind of dismissive sneer the HN guidelines advise against.

You can write dev docs for humans and still want machine readability (without caring about whether some LLM can make sense of the docs).

Machine readability is how you repurpose your own documentation in different contexts. If your documentation it isn't machine readable it might as well be in a .doc(x) file.


There's quite a lot of pricing data available for the energy market and it might be possible to approximate battery profitability by rerunning normal and long-tail history.

See https://www.ercot.com/mktinfo/prices and https://www.ercot.com/gridmktinfo/dashboards and https://www.ercot.com/gridmktinfo/dashboards/energystoragere... for example.


Agriculture feeds people, Simon.

It's fair to be critical of how the ag industry uses that water, but a significant fraction of that activity is effectively essential.

If you're going to minimize people's concern like this, at least compare it to discretionary uses we could ~live without.

The data's about 20 years old, but for example https://www.usga.org/content/dam/usga/pdf/Water%20Resource%2... suggests we were using over 2b gallons a day to water golf courses.


The vast majority of water in agriculture goes to satisfy our taste buds, not nourish our bodies. Feed crops like alalfa consume huge amounts of water in the desert southwest but the desert climate makes it a great place to grow and people have an insatiable demand for cattle products.

We could feed the world with far less water consumption if we opted not to eat meat. Instead, we let people make purchasing decisions for themselves. I'm not sure why we should take a different approach when making decisions about compute.


> We could feed the world with far less water consumption if we opted not to eat meat.

If you look at the data for animals, that’s not really true. See [1] especially page 22 but the short of it is that the vast majority of water used for animals is “green water” used for animal feed - that’s rainwater that isn’t captured but goes into the soil. Most of the plants used for animal feed don’t use irrigation agriculture so we’d be saving very little on water consumption if we cut out all animal products [2]. Our water consumption would even get a lot worse because we’d have to replace that protein with tons of irrigated farmland and we’d lose the productivity of essentially all the pastureland that is too marginal to grow anything on (50% of US farmland, 66% globally).

Animal husbandry has been such a successful strategy on a planetary scale because it’s an efficient use of marginal resources no matter how wealthy or industrialized you are. Replacing all those calories with plants that people want to actually eat is going to take more resources, not less, especially when you’re talking about turning pastureland into productive agricultural land.

[1] https://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1...

[2] a lot of feed is also distiller’s grains used for ethanol first before feeding them to animals, so we’d wouldn’t even cut out most of that


That paper makes the opposite argument than you thought it made. Even from a freshwater perspective it’s more water-efficient to get calories/protein/fat directly from crops than from animal products.

Since you like their work, the authors of your paper answered that question more generally here https://link.springer.com/article/10.1007/s10021-011-9517-8 where they conclude "The water footprint of any animal product is larger than the water footprint of crop products with equivalent nutritional value".

You make some often debunked claims, like we'd have to plant more crops to feed humans directly if we stopped eating meat.

This shouldn't make intuitive sense to you since animals eat feed grown on good cropland (98% of the water footprint of animal ag) that we could eat directly, and we lose 95% of the calories when we route crops through animals.


You’re confusing my argument. I’m not arguing that meat is more efficient than plants; that by itself is obviously untrue because of the trophic efficiency loss. I’m arguing that meat uses marginal resources that would otherwise be useless for growing food. If we eliminate meat we’d eliminate a lot of marginally productive farmland that is naturally irrigated and have to replace it with industrialized irrigation farming that will be significantly more expensive and use more water and energy. Maybe if we had a global cultural reset with industrialized farming, but no one is going to be happy replacing their beef and chicken with the corn and barley and hay that those lands normally grow, much of it inedible to humans.

That paper isn’t actually debunking anything that I’m saying. If the water foot print per calorie is 20x for beef but the feed is grown with 90% of its water from rainfall, that’s not a 20x bigger footprint in a way that practically matters because most of that water is unrecoverable anyway. The water that is recoverable just makes it through the watershed.

Meat is a way to convert land that cant grow things people can or want to eat into things that people will eat. That pastureland and marginal cropland growing animal feed can’t just be converted to grow more economically productive crops like fruit and vegetables without Herculean engineering effort and tons of water and fertilizer. Instead the farming would have to stress other fertile ecosystems like the Southwest which would make the water problems worse, even if their total “footprint” is smaller. The headline that beef uses 20x more water per calorie completely ignores where that water comes from and how useful it actually is to us.

I don’t doubt that we can switch to an all plant diet as a species but people vastly underestimate the ecological and societal cost to do so.


I mean it's even simpler. Almonds are entirely non essential (many other more water efficient nuts) to the food supply and in California consume more water than the entire industrial sector, and a bit more than all residential usage (~5 million acre-feet of water).

Add a datacenter tax of 3x to water sold to datacenters and use it to improve water infrastructure all around. Water is absolutely a non-issue medium term, and is only a short term issue because we've forgotten how to modestly grow infrastructure in response to rapid changes in demand.


Thanks for the sanity!! I wish more people understood this


Ask people which they'd rather have: -no more meat and a little better AI -keep their meat and AI doesn't improve from where it is today..


I called out golf in my first comment in this thread.

If data center usage meant we didn't have enough water for agriculture I would shout that from the rooftops.


Yep--I'm agreeing that one's a good comparison to elaborate on.

Exploring how it stacks up against an essential use probably won't persuade people who perceive it as wasteful.


Growing almonds is just as essential as building an AI. Eating beef at the rate americans do is not essential. Thats where basically all the water usage is going.


Agriculture is generally essential but that doesn't mean that any specific thing done in the name of agriculture is essential.

If Americans cut their meat consumption by 10%, we would use a lot less water in agriculture and probably also live longer in general


Iran's ongoing water crisis is an example. One cause of it is unnecessary water-intensive crops that they could have imported or done without (just consume substitutes).

It's a common reasoning error to bundle up many heterogeneous things into a single label ("agriculture!") and then assign value to the label itself.


I'll cop to not reading the whole list before commenting, but I skimmed this and didn't really notice anything about speed or performance.

When using tools that can emit 0 to millions of lines of output, performance seems like table-stakes for a professional tool.

I'm happy to see people experiment with the form, but to be fit for purpose I suspect the features a shell or terminal can support should work backwards from benchmarks and human testing to understand how much headroom they have on the kind of hardware they'd like to support and which features fit inside it.


This is probably crazy talk, but I have been wondering how requiring people to slap a stamp on an envelope and mail in a résumé would go.


I don’t think it is crazy and I have suggested beforehand there needs to be some sort of proof of work on the candidate side to prevent resume spam.

I think your idea is very elegant as everyone has access to the mail system, an actual stamp is pretty cheap, but it is just enough hassle to mail an application that it will filter out some of the spam.

The other suggestion I have had is that candidates need to hand in the resume in person, but I guess you could accept resumes from both mail and in person drop offs.


> The other suggestion I have had is that candidates need to hand in the resume in person

This might be a bigger lift than asking for a take-home project; if I'm expected to drop off a resume in Manhattan that's a minimum of a two hour trip for me. I'd rather spend two hours banging together a CRUD app to show that I can actually write code.


I read this as applying to in-office roles, If I were willing to commute, then it's a good chance to exercise the lift required.


> but it is just enough hassle to mail an application that it will filter out some of the spam.

More than that, bots currently have no way of sending snail-mail :-)


The only time I had to hire somebody, the university I was working for in Switzerland made it mandatory for the candidates to send their application via mail, not email. That was back in 2014. I found this odd at the time, but I'm pretty sure it made my job way easier (less applications to review, motivated/serious candidates, etc.).


The em is more than a synonym for semicolon.


When you say library system, do you mean something more or less like a separate search path and tools for managing it?

I've written a little about how we can more or less accomplish something like meaningfully-reusable shell libraries in the nix ecosystem. I think https://www.t-ravis.com/post/shell/the_missing_comprehensive... lays out the general idea but you can also pick through how my bashrc integrates libraries/modules (https://github.com/abathur/bashrc.nix).

(I'm just dropping these in bin, since that works with existing search path for source. Not ideal, but the generally explicit nature of dependencies in nix minimizes how much things can leak where they aren't meant to go.)


> do you mean something more or less like a separate search path

Exactly. I sent to the GNU Bash mailing list patches that implement literally this. It worked like this:

  # searches for the `some-module` file
  # in some separate PATH just for libraries
  # for example: ~/.local/share/bash/modules

  source -l some-module
At some point someone said this was a schizophrenic idea and I just left. Patches are still on the mailing list. I have no idea what the maintainer did with them.


It seems reasonable to me. Sorry you got that reaction. Just from the mailing list thread size alone, it looks like you put quite a lot of work into it.

I wasn't readily able to find where the discussion broke down, but I see that there's a -p <path> flag in bash 5.3.


> Just from the mailing list thread size alone, it looks like you put quite a lot of work into it.

The patch itself was pretty simple, it's just that the community argued a lot about this feature even though I validated the idea in the mailing list before I even cloned the repository. I remember arguing with one person for days only to learn later he hadn't even read the code I sent.

> I wasn't readily able to find where the discussion broke down

Somewhere around here:

https://lists.gnu.org/archive/html/bug-bash/2024-05/msg00352...

The maintainer was pretty nice, he was just looking for community consensus.

> I see that there's a -p <path> flag in bash 5.3

That's the solution the maintainer favored back then:

https://lists.gnu.org/archive/html/bug-bash/2024-05/msg00333...

https://lists.gnu.org/archive/html/bug-bash/2024-05/msg00337...

So in the end it did make it into bash in some form. It's not very ergonomic by default but that can be fixed by wrapping it in an alias or function. Good enough.

Looks like 5.3 was released just over a month ago too so pretty recent. I'm gonna update and start using it right away. Thanks, Chet Ramey!


Chet seems like good folks from everything I've seen.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: