Hacker Newsnew | past | comments | ask | show | jobs | submit | jdlshore's commentslogin

This is self-reported productivity, in that devs are saying AI saves them about 4 hours per week. But let’s not forget the METR study that found a 20% increase in self-reported productivity but a 19% decrease in actual measured productivity.

(It used a clever and rigorous technique for measuring productivity differences, BTW, for anyone as skeptical of productivity measures as I am.)


Let's also not forget the multiple other studies that found significant boosts to productivity using rigorous methods like RCTs.

However, because these threads always go the same way whenever I post this, I'll link to a previous thread in hopes of preempting the same comments and advancing the discussion! https://news.ycombinator.com/item?id=46559254

Also, DX (whose CTO was giving the presentation) actually collects telemetry-based metrics (PR's etc.) as well: https://getdx.com/uploads/ai-measurement-framework.pdf

It's not clear from TFA if these savings are self-reported or from DX metrics.


https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...

That info is from mid 2025, talking about models released in Oct 2024 and Feb 2025. It predates tools like Claude Code and Codex, Lovable was 1/3 current ARR, etc.

This might still be true but we desperately need new data.


None of those changes address the issue jdlshore is pointing out: self assessed developers productivity increases from LLMs are not a reliable indication of actual productivity increases. It's true that modern LLMs might have less of a negative impact on productivity or increase it, but you won't be able to tell by asking developers if they feel more productive.

(Also, Anthropic released Claude Code in Febuary of 2025, which was near the start of the period the study ran).


Yeah new data would be great, but i feel like these tools are not substantively better and this is becoming the new "its different this time!"

Has the METR study been replicated?

Not a scientific study, but someone did replicate the experiment on themselves [0] and found that in their case, any effect from LLM use wasn't detectable in their sample. Notably they almost certainly had more experience with LLMs than most of the METR participants did.

[0] https://mikelovesrobots.substack.com/p/wheres-the-shovelware...


I haven’t heard about any similar studies, no. I’m planning to conduct one at my workplace but we’re still deciding exactly which uses of AI to test.

Doesn’t look like goal-post moving to me. GP argued that AI isn’t making a difference, because if it was, we’d see amazing AI-generated open source projects. (Edit: taking a second look, that’s not exactly what GP said, but that’s what I took away from it. Obviously individuals create open source projects all the time.)

You rebutted by claiming 4% of open source contributions are AI generated.

GP countered (somewhat indirectly) by arguing that contributions don’t indicate quality, and thus wasn’t sufficient to qualify as “amazing AI-generated open source projects.”

Personally, I agree. The presence of AI contributions is not sufficient to demonstrate “amazing AI-generated open-source projects.” To demonstrate that, you’d need to point to specific projects that were largely generated by AI.

The only big AI-generated projects I’ve heard of are Steve Yegge’s GasTown and Beads, and by all accounts those are complete slop, to the point that Beads has a community dedicated to teaching people how to uninstall it. (Just hearsay. I haven’t looked into them myself.)

So at this point, I’d say the burden of proof is on you, as the original goalposts have not been met.

Edit: Or, at least, I don’t think 4% is enough to demonstrate the level of productivity GP was asking for.


It has been argued for a very long time, lines of code is largely meaningless as a metric. But now that AI is writing those lines... it seems to be meaningful again? I continue to be optimistically skeptical.

It's not a great ask. Who's going to quantify what is 'amazing open source work'?

4% for a single tool used in a particular way (many are out there using AI tools in a way that doesn't make it clear the code was AI authored) is an incredible amount. Don't see how you can look at that and see 'not enough'.

The vast majority of people using these tools aren't announcing it to the world. Why would they ? They use it, it works and that's that.


So we're suddenly going back into measuring lines of code as a useful metric?

Just because people are shitting out endless slop code that they never bothered to throw a 2nd glance at doesn't mean it'sgood or that it's leading to better projects or tools, it literally just means people are pushing code out haphazardly . If I made a python script that everyone started using and all it did was create a repo, commit a README and push it every 5 seconds we'd be seeing billions of lines of code added! But none of it is useful in any way.

Same with AI, sure we're generating endless piles of code, but how much of it is actually leading to better software?


It's not the be all end all and there are obviously issues with using it alone, but it's rather silly going the other extreme and pretending it isn't a major factor.

>If I made a python script that everyone started using and all it did was create a repo, commit a README and push it every 5 seconds we'd be seeing

1. Well you can't do that

2. Something like that won't register as Claude Code (or any other AI tool) usage anyway

3. Something like that won't come anywhere near 4%


> Well you or anyone else can't do that...

But that's what these tools are doing, in a large number of cases? At least the end result is basically the same, like that Clawdbot or whatever name they've decided to try ride the coattails of guy who has 70k commits in the last few months that I saw being touted as an impressive feat on HN the other day. How much broken, unusable code exists within those 70k commits that ultimately would've had the same effect as if he had just pushed a `--allow-empty` commit thousands of times?

Now whatever, if it's people pushing slop into their own codebase that they own, more power to them, my issue stems from OSS projects being inundated with endless spam MR/PRs from AI hypesters. It's just making maintainer's lives more difficult, and the most annoying part of it all is that they have to put up with people who don't see the effort disparity between them prompting their chatbot to write up some broken bullshit vs the effort required for maintainers to keep up with the spam. It hurts the maintainers, it hurts genuine beginners who would like to learn and contribute to projects, it hurts the projects themselves since they have to waste what precious little time and resources they already have digging through crap, it hurts quite literally everyone who has to deal with it other than the selfish AI-using morons who just take a huge dump over everyone and spouts shit like "Well 4% of all code on Github is now AI-generated!" as if more of that is somehow a good thing.


>But that's what these tools are doing, in a large number of cases?

I mean No not really. I'm not sure why you think that.

>How much broken, unusable code exists within those 70k commits that ultimately would've had the same effect as if he had just pushed a `--allow-empty` commit thousands of times?

How much stable usable code exists within those 70k commits ?

This is pretty much exactly why I said the original question was not a great ask. You have your biases. Show an example and the default response for some almost like a stochastic parrot is, 'Must be slop!". How do you know ? Did you examine it? No, you didn't. That's just what you want to believe so it must be true. It makes for a very frustrating conversation.


In those 15 years, did you send them any money? I suspect the point is “we don’t make enough money to keep writing cool articles that keep people coming back for 15 years.”

I would think it’s pretty obvious that’s not the question I am asking. What is the point of making a page I can’t even look at to see that ads.

Yes, because solar is now the most cost-effective form of energy generation. That’s why grid-scale solar is being deployed on a massive scale world wide.

The new OS (with Liquid Glass) has SMS message categorization that works fine to filter spam IME. You still have to delete them if you don’t want the red dot, but at least I don’t get alerts any more.

I upgraded iOS just for this feature and am glad I did. Not a fan of Liquid Glass, though.


Why would there be something about browser development? This is obviously an effort to diversify their revenue streams.


Nothing on that list screams revenue.


What are you talking about? They said "AI" three times!

/s


This might have been “The Leprechauns of Software Development,” by Laurent Bossavit, which is one of my favorites. He digs into the sources behind a bunch of the popular sayings and determines that they’ve all been greatly exaggerated and/or misquoted.


yes, that was the book I was referring to


Only if Tesla is able to roll out a competing service. Given that they have zero cars without a safety driver on public roads, I’d say they’re a very long way from doing so, and I have my doubts about their ability to do so at all. Their CEO talks big but doesn’t deliver.


Yes, they do. I know because they’re disclosed, and how they’re used is disclosed. (They’re disabled unless you call support.)


Not a lawyer, but this seems like a pretty direct violation of copyright law. Even if AI’s use of it isn’t infringing, copying the files for upload seems like it must be. Scrubbing proprietary data doesn’t mean they’re suddenly public domain.

Who’s on the hook here, though? The contractor uploading the file is the actual person doing the copying. Or is OpenAI culpable for directing them to do so?


According to my understanding (IANAL), you dont have to go that far into copyright law - Id guess its already breaking working/conract in the very first second just by copying it and "carrying it out of the building", regardless if you spread it on the internet later on?

Actually two breaches, if Im not wrong then?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: