Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Cheating Is All You Need (sourcegraph.com)
73 points by sebg 11 months ago | hide | past | favorite | 114 comments


I'm 100% an AI skeptic. I don't think we're near AGI, well any reasonable definition of it. I don't think we're remotely near a super intelligence or the singularity or whatever. But that doesn't matter. These tools will change how work is done. What matters is what he says here:

> Peeps, let’s do some really simple back-of-envelope math. Trust me, it won’t be difficult math.

> You get the LLM to draft some code for you that’s 80% complete/correct.

> You tweak the last 20% by hand.

> How much of a productivity increase is that? Well jeepers, if you’re only doing 1/5th the work, then you are… punches buttons on calculator watch… five times as productive.

The hard part is the engineering, yes, and now we can actually focus on it. 90% of the work that software engineers do isn't novel. People aren't sitting around coding complicated algorithms that require PhDs. Most of the work is doing pretty boring stuff. And even if LLMs can only write 50% of that you're still going to get a massive boost in productivity.

Of course this isn't always going to be great. Debugging this code is probably going to suck, and even with the best engineering it'll likely accelerate big balls of mud. But it's going to enable a lot of code to get written by a lot more people and it's super exciting.


> But it's going to enable a lot of code to get written by a lot more people and it's super exciting.

Is that exciting though? A lot of the code we've currently got is garbage. We are already slogging through a morass of code. I could be wrong, but I don't think LLMs are going to change this trend.


it is GREAT time to slowly get into the consulting side of our industry. there will be so much mess that I think there will be troves of opportunity to come in on a contract and fix messes. there always was but there will be a ton more!


> A lot of the code we've currently got is garbage.

That is why I have never understood the harsh criticism of quality of LLM generated code. What do people think much of the models were trained on? Garbage in -> Garbage out.


I think it's exciting that more people will be able to build more things. I agree though that most code is garbage, and that it's likely going to get worse. The future can be both concerning and exciting.


Personal anecdote: I pushed some LLM generated code into Prod during a call with the client today. It was simple stuff, just a string manipulation function, about 10 LOC. Absolutely could have done it by hand but it was really cool to not have to think about the details - I just gave some inputs and desired outputs and had the function in hand and tested it. Instead of saying "I'll reply when the code is done" I was able to say "it's done, go ahead and continue" and that was a cool moment for me.


That's the perfect application for llm code! It is a well defined problem, with known inputs and outputs, and probably many examples the llm slurped up from open source code.

And that isn't a bad thing! Now us engineers need to focus on the high level stuff, like what code to write in the first place.


> How much of a productivity increase is that? Well jeepers, if you’re only doing 1/5th the work, then you are… punches buttons on calculator watch… five times as productive.

To state the obvious -- only if all units of work are of equal difficulty.


The last 20% always takes 80% of the effort, of course.


The original calculation is not even right, because 100% of the work software engineers do probably isn’t even writing code. A lot of the time they’re debugging code or attending some kind of meeting. So if only 20% of their time is coding and they do 1/5th of it, they’re like what, 16% more productive overall?

Now factor in the expense of using LLMs, it’s not worth it, the math isn’t mathing. You can’t even squeeze a 2x improvement, let alone the 5-10x that has become expected in the tech industry.


Also absent in these discussions is who maintains the 80% that wasn’t written by them when it needs to either be fixed or adapted to a new use case. This is where I typically see “fast” AI prototypes fall over completely. Will that get better? I’m ambivalent but could concede ‘probably’ but at the end of the day I don’t think you can get away from the fact that for every non trivial programming task, there does need to be a human somewhere to verify/validate work, or at the very least understand it enough to debug it. This seems like an inescapable fact.


And one thing people don’t realize is that if you’re working in a codebase heavily coded by AI, you’re going to want to see the original prompts used if you don’t know the original engineers intent. And no one seems to track this. It becomes a game of Chinese whispers.

Honestly this quickly becomes a shit show and I don’t know how anyone can seriously consider having a large amount of a codebase coded by AI.


> And one thing people don’t realize is that if you’re working in a codebase heavily coded by AI, you’re going to want to see the original prompts used if you don’t know the original engineers intent.

If I'm working in a codebase that is heavily coded by anyone who isn't "me, within the last week", I'm going to want to see the implementation-technology-neutral requirements (and, ideally, human-written tests/test scenarios, unit and acceptance and everywhere in between, operationalizing the requirements) because I can't know the original engineer's intent.

This isn't any less true when the engine was wet biological goop instead of silicon.


How is this different from any legacy code base? Everywhere I've worked there's been swaths of code where the initial intent is lost to time and debugging them is basically archeology.

Random AI code isn't going to be categorically different from random code copied from StackOverflow, or code simply written a few years ago. Reading code will always be harder than writing it.


Legacy code is written by humans, and when you read stuff written by humans you can anticipate what their intent was because you are a human as well.

With AI, there is no intent, the AI is a probabilistic model and you have to guess if the model got it right.

It’s like the difference between driving next to a human driver vs a self driving car. You have some idea of what the human will do but the self driving car could flip out at any moment and do something irrational.


> Legacy code is written by humans, and when you read stuff written by humans you can anticipate what their intent was because you are a human as well.

Well, no. This is both technically categorically false (because you can only "anticipate" something in advance, and the intent is in the past by the time you are reading the code, though you might in principal infer their intent), and often practically false even when read with "infer" in place of "anticipate", as it is quite common for the intent of human-written code to be non-obvious, except in the sense of "the purpose of a system is what it does", in which sense AI-written code is not particularly more opaque.


No, in time it’s going to look just like that but with hallucinations and large areas of the codebase no human has even read. No doubt digging through code and trying to understand will involve a lot more prompting back and forth


> No, in time it’s going to look just like that but with hallucinations and large areas of the codebase no human has even read.

An area of code that no human has ever read is no different to me than an area of code lots of humans who aren't me have read, and that's not that different from an area of code that I have read, but not particularly recently.

I mean, except for who it is that I am going to be swearing about when it breaks and I finally do read it.


An area of code that’s been read a lot is likely to be like a paved road. Sure, it could have cracks or mud but it’s generally understood and really bad warts fixed or warning signs left behind. An area of code that runs rarely and is inspected even more rarely is likely to have bring more surprises and unexpected behaviors.


> How is this different from any legacy code base? Everywhere I've worked there's been swaths of code where the initial intent is lost to time and debugging them is basically archeology.

I will concede your point even though I disagree with it for the sake of this argument - specifically, that it's not different than legacy code bases (I believe it is, because they were written by humans with much more business and code context than any LLM on the market can currently consume) - then why use AI at all if it isn't very much different?

I can say from my actual experience and specialty, which is dissecting spaghetti'd codebases (more from a cloud infrastructure perspective, which is where much of my career has been focused on), that any kind of lost knowledge in legacy codebases usually presents themselves in clues or just asking some basic questions from the business owners that do remain. someone knows something about the legacy code that's running the business, whether that's what it's for or what the original purpose was, and I don't realistically expect an LLM/chatbot/AI will ever be able to sus that out like a human could, which involves a lot of meetings and talking to people (IME). This is just based on my experience untangling large codebases where the original writers had been gone for 5+ years. From my perspective expecting an AI to maintain huge balls of mud is much more likely to result in bigger piles of mud than they originally generated - I don't see how it can logically follow that they'll somehow improve the ball of mud such that it no longer is a ball of mud. They're especially prone to this because of their 100% agreeability. And given the current strategy of just using increasingly larger context windows and more compute to account for these types of problems - I don't see how expecting an AI to maintain a huge ball of mud for a long time is realistically feasible. every line of code and related business context then adds to the cost exponentially in a way that doesn't happen when just hiring some fresh junior with a lot of salt and vinegar and get-up wouldn't also solve with enough determination.

A common thing that comes up in SWE is that the business asks for something that's either stupid, unreasonable, or a huge waste of time and money - Senior engineers know when to say no or the spots to push back. LLM's and the cleverest "prompt engineers" simply don't and I don't see any world where this gets better, again, due to the agreeability issues. I also don't see or understand a world where these same AI engineers can communicate to the business the constraints or timelines in a way that makes sense. I don't expect this to improve, because every business tends to have its unique problems that can't simply be trained on from some training set.


My concern is that it's always harder to comprehend and debug code you didn't write, so there are hidden costs to using a copy-pasted solution even if you "get to the testing phase faster." Shaving percentages in one place could add different ones elsewhere.

Imagine if your manager announced a wonderful new productivity process: From now on a herd of excited interns will get the first stab at feature requests, and allll that the senior developers need to do is take that output and juuuust spruce it up a bit.

Then you have plenty of time for those TPS reports. (Now, later in my career, I have a lot more sympathy for Tom Smykowski, speaking with customers so the engineers don't have to.)


> I don't think we're near AGI, well any reasonable definition of it.

But wait just hear me out for a second! What if we've also sufficiently lowered the definition of "GI" at the same time...


> Of course this isn't always going to be great. Debugging this code is probably going to suck, and even with the best engineering it'll likely accelerate big balls of mud. But it's going to enable a lot of code to get written by a lot more people and it's super exciting.

This is quite literally what junior engineers are.

I've worked at many places filled to the brim with junior engineers. They weren't enjoyable places to work. Those businesses did not succeed. They sure did produce a lot of code though.


I'd also be very suspicious of this Pareto distribution, to me it implies that the 20% of work we're talking about is the hard part. Spitting out code without understanding it definitely sounds like the easy part, the part that doesn't require much if any thinking. I'd be much more interested to see a breakdown of TIME, not volume of code; how much time does using an LLM save (or not) in the context of a given task?


Whatever time is be saved (if?) will be taken up by the higher expectation of the worker. Imagine a manager’s monologue “now that you have AI, wouldnt’t it be easier to implement those features that we had to scrap because we were understaffed? Soon we’ll be even more understaffed and you’ll have to work harder.”

Let’s see hope my pessimism is unfounded.


The thing with junior engineers is that they may eventually become proper seniors and you'll play your role in it by working with them. Nobody learns anything from working with LLMs.

You can't become a senior and skip being a junior.


This isn't about the promise of AI to me, it's more about this type of AI technology, which should more accurately be called LLMs, as there are others.

Skepticism aside, looking at what's been possible or similar, much like Microsoft word introduces more and more grammar related features, LLMs are reasonably decent for the use case of supplying your input to it and speeding up iterations with you at the drivers seat. Whether it's text in a word processor, or code in an editor, there is likely some meaningful improvements to not dismiss the technology.

The inverse of that, where someone wants a magic seed prompt to do everything, could prove to be a bit more challenging where existing models are trained on what they know, and steps to move beyond that where they learn appear to be a few years away.


> But it's going to enable a lot of code to get written by a lot more people and it's super exciting.

I think it will cut both ways. I guess it depends on the type of work. I feel like expectations will fill to meet the space of the increased productivity. At least, I can see this being true in a professional sense.

If I could be five times more productive i.e., work 80% less, then I would be quite excited. However, I imagine most of us will be still be working just as much -- if not more.

Maybe I am just jaded, but I can't help but be reminded of the old saying, "The reward for digging the biggest hole is a bigger shovel."

For personal endeavors, I am nothing but excited for this technology.


> I imagine most of us will be still be working just as much -- if not more

Yes but there will be far less of us working.

Which, in a healthy society would actually be a good thing.


> Which, in a healthy society would actually be a good thing.

The bad news is that we do not and have not ever lived in one.


> You get the LLM to draft some code for you that’s 80% complete/correct.

> You tweak the last 20% by hand.

> How much of a productivity increase is that? Well jeepers, if you’re only doing 1/5th the work, then you are… punches buttons on calculator watch… five times as productive

Except, it’s probably four times as hard to verify the 80% right code as it is to just write the code yourself in the first place.


Using AI is like trying to build a product with only extremely junior devs. You can get something working fairly quickly but sooner rather than later it's going to collapse in on itself and there won't be anyone around capable of fixing it. Moving past that requires someone with more experience to figure out what they did and eventually fix it, which ends up being far more expensive than correcting things along the way.

The error rate needs to come down significantly from where it's currently at, and we'd also need AIs that cover each others weaknesses - at the moment you can't just run another AI to fix the 20% of errors the first one generated because they tend to have overlapping blind spots.

Having said all that, I think AI is hugely useful as a programming tool. Here are the places I've got most value:

- Better IDE auto-complete.

- Repetetive re-factors that are beyond what's possible to do with a regex. For example, replaing one function with another whose usage is subtly different. LLMs can crack out those changes across a codebase and it's easy to review.

- Summarizing and extracting intent from dense technical content. I'm often implementing some part of a web standard, and being able to quickly ask an AI questions, and have it give references back to specific parts of the standard is super useful.

- Bootstrapping tests. The hardest part of writing tests in my experience is writing the first one - there's always a bit of boilerplate that's annoying to write. The AI is pretty good at generating a smoke test and then you can use that a basis to do proper test coverage.

I'm sure there are more. The point is that it's currently incredibly valuable as a tool, but it's not yet capable of being an agent.


It seems like nobody cares about the code quality, it just has to work until the conpany is acquired


The problem is that now, when your colleagues/contributors push out shitty code that you have to fix... you don't know if they even tried or just put in bad code from an LLM.


Yep, this is a hell I've come to know over the past year. You get people trying to build things they really have no business building, not without at least doing their own research and learning first, but the LLM in their editor lets them spit out something that looks convincing enough to get shipped, unless someone with domain experience looks at it more closely and realizes it's all wrong. It's not the same as bodged-together code from Stack Overflow because the LLM code is better at getting past the sniff test, and magical thinking around generative AI leads people to put undue trust in it. I don't think we've collectively appreciated the negative impact this is going to have on software quality going forward.


> you don't know if they even tried or just put in bad code from an LLM

You don't need to. Focus on the work, not the person.

I recall a coworker screaming at a junior developer during a code review, and I set up a meeting later on to discuss his reaction. Why was he so upset? Because the junior developer was being lazy and not trying.

How did he know this?

He didn't. He used very dubious behavioral signals.

This was pre-LLM. As applicable now as it was then. If their code is shitty, let them know and leave it at that.

On the flip side, I often review code that I think has been written by an LLM. When the code is good, I don't really care if he wrote it or asked GPT to write it. As long as he's functioning well enough for the current job.


That's a weird take. I am not blaming people trying, I am blaming people cheating. I have no problem with people being inexperienced, but if the amount of seemingly inexperienced colleagues I have to deal with goes up, and it's because a growing fraction of them are cheating, it is natural to feel frustrated. Especially because you can't know which fraction tried their best and deserve patience and mentoring, and which fraction are lying and wasting your time.

I work for a university. I have literally caught student workers putting the result of an LLM in the code without reading it, because they did it during screen sharing. You can't teach somebody who does not want to learn (because they just want to pass the semester and have no incentive to try hard). Now it's just way more easy to pretend to have put in effort.


> but if the amount of seemingly inexperienced colleagues I have to deal with goes up, and it's because a growing fraction of them are cheating, it is natural to feel frustrated.

It's also natural to ask why you think the growing fraction is because they cheated.

Look, I don't know how old you are, but the quality of the average developer has been dropping for decades. Prior to GPT, I routinely had to deal with developers who refused to learn anything (e.g. version control) and needed hand holding.[1] They didn't want to read man pages, etc. It's silly to suddenly get upset because LLMs are yet another reason for this phenomenon.

If there's a problem, it's hiring practices. Not LLMs.

> Especially because you can't know which fraction tried their best and deserve patience and mentoring, and which fraction are lying and wasting your time.

Given your comments, it's trivial to see why they would lie to you. They wouldn't to me, because I'm not going to discourage LLM usage.

> I work for a university.

Even before I read your comment, I was about to point out how your stance is valid in the domain of education, and not the workplace. And you just validated it :-)

The concept of cheating when it comes to education is vastly different compared to the work place. The goal of teaching is to learn, and even more relevant, for teachers to evaluate how well someone is learning. If they're using LLMs (or calculators, for that matter), then it becomes harder to evaluate. The goal is not to get something done, but to learn. That's why the environment in school is very constrained, and very different.

The worst jobs I've had are when people failed to realize that they're now in a different environment and they continue to apply those principles where they don't make sense. The main goal at work is to get something done, regardless of the tools.[2]

If someone tries to put good code in, stop caring if it comes from an LLM.

If someone tries to put bad code in, stop caring if it comes from an LLM.

[1] And yes, today's LLMs do a better job than many crappy developers I've had to put up with. Had LLMs existed back than and they used them, I'd be a happier man.

[2] As long as you're not violating laws, copyright, licenses, patents, etc.


Does it matter where the bad code came from?


The pace with which ai slop can overwhelm the resources we use to review said code is pretty concerning.


Yeah, but you can teach your colleagues to get better. An LLM will give you the same crap every time.


You missed my point, there is no use teaching those who are cheating. And there is no use learning if you don't get called out.


Thanks tor pointing that out. I missed that, indeed. That’s a stellar point.


> You missed my point, there is no use teaching those who are cheating.

It's a very invalid point. Perhaps in your life you've never encountered people who cheat and later change their ways, but it's fairly common for them to change their ways.


The fact they might change their way does not erase the fact that I am wasting the time I spend mentoring cheaters, time I could spend mentoring non-cheaters.

Would you also say it's ok for people to steal because they might one day not-steal? So an increase in robberies does not hurt? I am not sure I understood you correctly.


> The fact they might change their way does not erase the fact that I am wasting the time I spend mentoring cheaters, time I could spend mentoring non-cheaters.

Several people have asked you, and you still haven't explained why it's OK to waste time with "non-cheaters" and not with "cheaters".

> Would you also say it's ok for people to steal because they might one day not-steal?

No. Unless stealing is permitted, which it usually isn't. LLMs, OTOH, are merely a tool and their use has no morality attached to it. It's you that seem to attach some morality to LLM use.

(Of course, if your company disallows LLM usage, that's another story).

> I am not sure I understood you correctly.

Your comments seem to make the assumption that LLM based code is shittier than what a typical poor developer would write, which I pointed out in another comment is often not true.

Just treat the code based on its merit. If it's utter crap, tell the developer it's utter crap. Just stop second guessing whether he wrote it or an LLM did. There is a threshold below which even I won't review code.


There comment is in response to the article. The article straight up advocates for starting from LLM generated code. This is a perfectly valid counter point.

Aside: it’s confusing that parent commenter used “cheating” in a completely different way than the title of the article.


I did yeah, apologies. Thanks for pointing that out.


I've deployed code with Claude that was >100X faster, all-in (verification, testing, etc) than doing it myself. (I've been writing code professionally and successfully for >30 years).

>100X isn't typical, I'd guess typical in my case is >2X, but it's happened at large scale and saved me months of work in this single instance. (Years of work overall)


If you wirte 100x faster code you could probably automate almost all of it away as it seems to be super trivial and already resolved problems. I also use Claude for my coding a lot, but i’m not sure about real gains if it even give me noticeable speed improvement as i’m in a loop of “waiting” and fixing bugs that LLM made still it is super useful for writing big doc strings and smaller tests maybe if i focus on some basic tasks like classical backend or frontend it’ll be more useful


It's fascinating watching people in ostensibly leadership positions in our industry continue to fundamentally misunderstand how productivity works in knowledge work.


They never understood. AI is the new Mythical Man-Month.


The example I used to hear to demonstrate this type of logical fallacy is that even though one woman can make a baby in 9 months, having eight other women help doesn't make the baby available after a single month. I guess this might need to be updated to "one woman and eight LLMs can't make a baby in one month".


Totally. It used to be that LoC was widely accepted as a a terrible metric to measure dev productivity by, but when talking about AI code-gen tools it's like it's all that matters.


This is pretty much it. I don't trust code produced by AI, tbh, because it almost always has logical errors, or it ignored part of the prompt so it's subtly wrong. I really like AI for prototyping and sketching, but then I usually take that prototype, maybe write some solid tests, and then write the real implementation.


If the code these LLMs produce has logical errors and bugs, I have formed the opinion that this is mostly user error. LLMs produce great code if they're used properly. By that I mean: strictly control the maximum level of complexity that you're asking the LLMs to solve.

GPT-4o can only handle rudimentary coding tasks. So I only ask it rudimentary questions. And it's rarely wrong or bugged. o1-2024-12-17 is much better, it can handle "mid" complexity tasks, it's usually never wrong on those, but a step above and it will output buggy code.

This is subjective and not perfect, but when you use the models more you can get a sense for where it starts falling apart and stay below that threshold. This means chunking your problem into multiple prompts, where each chunk is nearing the maximum level of complexity that the model can handle.


The way many LLms are coded is to not ask questions where clarification is needed but just eagerly produce something and fill in the gaps.


I've built a pet project with LLMs recently. It's rough around the edges, with bugs, incomplete functionality and docs, but I've built it around 4 times as fast as I would have done it myself.

https://github.com/golergka/hn-blog-recommender

Basically 95% of the code is written by Aider. I still had to do QA, make high-level decisions, review code it wrote, assign tasks and tell it when to refactor stuff.

I would never use this workflow to write a program where absolute correctness is top priority, like a radiation therapy controller. But from this point on, I'll never waste time building pet projects, small web app and utilities myself.


And you'll be a worse programmer for it because you won't have the reps.


Programming is a means to an end for most of us. If you want to become an expert programmer for the sake of the craft then go for it, but most would benefit more from learning more about the business side. If you're building the wrong functionality then it doesn't matter how good your code is.


It's fascinating seeing this take on Hackernews of all places. I'm probably one of the most mercenary programmers you'll meet but this take is cynical even to me. If you have a skill that you use professionally and get paid hundreds of thousands of dollars to do, maybe you should cultivate and maintain it, idk.

You'll probably get paid more and work less if you know what you're doing as opposed to being yet another disposable I'll handle the business side guy.


I've been programming since 95. I have crafted beautiful handwritten codebases with much more complicated requirements. But at this point, I see code quality not as an end goal, but one of the trade offs.


Agreed. To add to this: The article is using a rather poor comparison because it makes the assumption that all sections of a program take an equal amount of time to write.

An LLM might be able to hammer out the unit tests, boilerplate react logic, and the express backend in 5 minutes, but if it collapses on the "20%" that you need actual assistance with, then the productivity boost is significantly less than suggested.


Yeah, this is almost literally the 80-20 rule[1]; often getting the first 80% of something done takes 20% of the work, and getting the last 20% takes 80% of the work. Debugging the last 20% is already the hard part for most things, and as you mention, it's going to be a lot harder debugging LLM-generated code.

[1]: https://en.wikipedia.org/wiki/Pareto_principle


re: debugging

I dont disagree but here's another spin on what you observed.

Kernighan's law says you have to be twice as intelligent/skilled when debugging the code as you were when writing it.

A LLM centric corollary would be -don't have the LLM write code at the boundary of its capabilities in your domain.

It's a little hard these days to know where that line lies with the rate of improvements but perhaps is an intuition people are still new at with LLMs.


Except you saved yourself all that mental effort. If it takes the same amount of time and I don't need to put 110% focus in, it's a win.


Verifying code is much more mental effort than writing it because you have to reconstruct the data structure and the logic of the code in order to show that the code you’re reading is correct.


You also absorbed no knowledge of the problem domain. I'm pretty positive on LLMs but with the caveat that the programmer still should be an active participant in the development of the solution.


I gave it the knowledge of the domain. I never said the programmer shouldn't be an active participant either. There's no way around that.


No, now I get to be a code reviewer instead of a writer, and that's a lot more mental energy for me.


> Except, it’s probably four times as hard to verify the 80% right code as it is to just write the code yourself in the first place.

Why would that be true in general? You think it just so happens that a new tool gives an X speedup in ability but costs an exactly equivalent X slowdown in other places?


No, the claim is just that it’s harder to read and verify existing code than to write new code. Depending on the level of certainty you want, verifying code is much more expensive than writing code, which is why there are so many bugs and why PR review is, imo, practically worthless.

Switching the job of programming from designing and writing code to reading and verifying code isn’t a good trade-off in terms of making programming easier. And my first speei node with AI tools has done nothing to change my mind on this point.


The thing that everyone seems to miss here is it's not surprising which code falls into the 80% or 20%. I can guess if an AI will write something bug-free with almost 100% accuracy. For the remaining 20%, whatever, it gives me a fine enough starting point.


To debug code that doesn't work, you generally have to fully understand the goal and the code.

To write code that does work, you generally have to fully understand the goal and what code to use.

So unless you just flat-out don't know what code to use (ie. libraries, architecture, approach etc.) it's going to be no faster and probably less satisfying to fix LLM code as it would be to just do it yourself.

Where LLMs do shine in coding (in my limited experience) though is discovery. If you need to write something in a mostly unfamiliar language and ecosystem they can be great for getting off the ground. I'm still not convinced that the mental gear shifting between 'student asking questions' and 'researcher/analyst/coder mode' is worth the slightly more targeted responses vs. just looking up tutorials, but it's more useful scenario.


why dont they get another LLM to do the last 20%, are they stupid?


I know how popular Steve Yegge is, and there is a bunch of stuff he's written in the past that I've enjoyed, but like, what is this? I know everyone says the Internet has ruined our attention spans, but I think it's fair to let the reader know what the point of your article is fairly early on.

As far as I can tell, this is a really unnecessarily long-winded article that is basically just "make sure you have good prompt context." And yes, when you're doing RAG, you need to make sure you pull the right data for your context.

Is this just a spam blog post for Sourcegraph?


It's an ad. The key point he works up to is `Put another way, you need a sidecar database. The data moat needs to be fast and queryable. This is a Search Problem!` And then of course this is hosted on the corporate blog of a code search platform. It's `make sure you have good prompt context` but trying to define that specifically as the thing they're selling


As another commenter said, this needs a March 2023 tag. RAG is now pretty standard and their are lots of vector databases that do this.


The semi-obvious conclusion here is that he had an idea and asked an LLM to write an article about it, in his own style.


This article really grates me the wrong way. A company selling an AI product saying the following about AI sceptics:

> All you crazy MFs are completely overlooking the fact that software engineering exists as a discipline because you cannot EVER under any circumstances TRUST CODE.

is straight up insulting to me, because it effectively comes down to "use my product or you're a looney".

Also, while two years after this post (which should be labeled 2023) I've still barely tried to entirely offload coding to an LLM, the few times I did try have been pretty crap. I also really, really don't want to 'chat' with my codebase or editor. 'Chatting' to me feels about as slow as writing something myself, while I also don't get a good mental model 'for free'.

I am a moderately happy user of AI autocomplete (specifically Supermaven), but I only ever accept suggestions that are trivially correct to me. If it's not trivial, it might be useful as a guide of where to look in actual documentation if relevant, but just accepting it will lead me down a wrong path more often than not.


> How much of a productivity increase is that? Well jeepers, if you’re only doing 1/5th the work, then you are… punches buttons on calculator watch… five times as productive.

Cool, so I'm going to make five times more money? What's that you say, I'm not? So who is going to be harvesting the profits, then?


You will get paid 5x as much… assuming you are the only developer that uses this and the market for engineers is rational. However those other developers are going to be 5x as productive too so your rate should stay the same.

The company that pays you will get more code for less money and so they would make more profit if they are the only company taking advantage of this. But all companies will start to produce software at a lower cost so their profits per line of code will be squeezed to where they are now.

You will spend a lot less time writing boiler plate and searching docs or stack overflow. That frees you up to work on real problems and since you’ll be able to create products with less cost per line of code you could potentially compete with your own offerings.

That’s my take anyway.


If you're 5x more productive from AI, you can do 5x less effort if you feel your previous productivity was sufficient for the current pay.


It's a pretty good time to be a contractor, as long as you negotiate getting paid per project/milestone and not by the hour. What used to take three days now takes one, and you can use the extra time to book more clients.


Have any of these LLM coding assistant enthusiasts actually used them? They are ok at autocomplete and writing types. They cannot yet write "80% of your code" on anything shippable. They regularly miss small but crucial details. The more 'enthusiastic' ones, like Windsurf, subtly refactor code - removing or changing functionality which will have you scratching your head. Programming is a process of building mental models. LLMs are trying to replace something that is fundamental to engineering. Maybe they'll improve, but writing the code was never the hard part. I think the more likely outcome of LLMs is the broad lowering of standards than 5x productivity for all.


I'm trying to use Cursor for full time building of applications for my personal need ( years of experience in the field). It can be rough, but you see the glimpses of the future. In the early weeks, I am constantly having to think of new ways to minimize surprise. Prompt-injection via dotfile (collection of Dos and Donts), forcing LLM to write itself jsdoc @instructions in problematic files, coming up with project overviews that states goals and responsibilities of different parts. It's all very flaky, but there's a promise. I think it is a good time to try mastering the LLM as a helper, and to figure out how to play your best game using this ultra sharp tool.

My 2c is that engineer will be needed in the loop to build anything meaningful. It's really hard to shape the play-dough of the codebase having to give up so much control. I see other people who can't validate the LLM BS get themselves into the hole really quickly when using multi-file editing capabilities like the Composer feature.

One thing that kinda blew me away even though it's embarrasingly predictable is "parallel edit" workflow, where LLM can edit multiple files at the same time using a shared strategy. It almost never works currently for anything non-trivial, so the agent almost never uses it unless you ask it to (Run typescript check and use parallel edit to fix the bugs). It's also dumb, using the same auto-completion model that doesnt have access to Composer context, it feels like. But this shows how we can absorb the tax of waiting for LLM to respond, is by doing multiple things at once to many files.

And tests. Lots of tests. There's no way about it, LLM can spice things up really subtly and without tests on a big project it's impossible to fix. LLMS dont have the same fear of old files from previous ages, they can start hacking what's not broken. Cursor has a capable "code review" system with checkpoints and quick rollbacks. At this point in time we have to review most of what LLM is spewing out. On the upside, LLM can write decent tests, extract failing cases, etc. There's no excuse not to use tests. However extra care needs to be taken with LLM-generated tests, as to review their legitimacy in asserts.

I tried to write a popular table game engine without understanding game rules personally, and i had a bunch of reference material. It was difficult when I couldnt validate the assertions. I think I failed the experiment and had to learn game rules by debugging failing tests one by one. And in many cases the culprit was the fixture generated by LLM. We can used combined workflows to make smarter models like o1-pro to validate the fixtures, that typically works on well known concepts and games


I use Cursor as well, totally understand what you mean about glimpses of the future. The composer and tab complete are good at writing isolated units of functionality - ie pure functions, especially when they are trivial and have been written before. I appreciate that it saves me a google search or time spent on rote work.

I do wonder how much more sophisticated LLMs would have to be to really excel at multi-file or crosscutting whole-codebase editing. I have no way of really quantifying this but I suspect its a lot more than people think, and even then might require a substantially different approach to making software. Kind of reminds me of the people trying to make full length movies with video AI - cool trick but who wants to sit through 1.5 hrs of that? The gulf between those videos and an actual feature length film is _massive_, much like the difference between a prod codebase and a todo crud app. And it's not just about the level of detail, it's about coherence between small but important things and the overall structure.


I too use Cursor as my primary IDE for a variety of projects. This will sound cliche in the midst of the current AI hype, but the most critical component of getting good results with Cursor is properly managing the context and conversation length.

With large projects, I've found that a good baseline is about 300-500 lines of output, and around 12 500 line files as input. Inside those limits, Cursor seems to do pretty well, but if you overstuff the context window (even within the supposedly supported limits) Claude Sonnet 3.5 tends to forget important attributes of the input.

Another critical component is to keep conversations short. Don't reuse the same conversation for more than one task, and try to engineer prompts to minimize the amount/size of edits.

I definitely agree that Cursor isn't a silver bullet, but it's one of the best tools I've found for bootstrapping large projects as a solo dev.


That was a lot of words to describe... I think just "LLMs use context"? And the intro made me think he was mad about something, but I think he's actually excited? It's very confusing.

I will admit, I skimmed. No jury would convict.


It's a Steve Yegge article. His writing is enjoyable, but it's a miracle he wrote an article short enough not to skim. He's excitable and verbose.

Although he's talking up his product, it's an interesting premise. The idea is that cooking relevant context and parts from a codebase down into an LLM's context window, is a useful product for coding assistants.


I think one important aspect of this is that many -- perhaps the bulk, now -- of people writing or talking on the Internet don't have much memory of a time when digital technology was underestimated -- or at least, did not get to their current positions by recognising when something was underestimated, and going with their own, overestimated beliefs. They probably got into technology because it was self-evidently a route to success, they were advised to do so, or they were as excited as anyone else by its overt promise. Their experience is dominated by becoming disillusioned with what they were sold on.

A different slice of time, and a different set of opportunities -- as Yegge says, the recollection that if you hadn't poo-pooed some prototype, you'd have $130 million based on an undeniably successful model -- gives you a different instinct.

I really do hear the -- utterly valid -- criticisms of the early Web and Internet when I hear people's skepticism of AI, including that it is overhyped. Overhyped or not, betting on the Internet in 1994 would have gotten you far, far, more of an insight (and an income) than observing all its early flaws and ignoring or decrying it.

This is somewhat orthogonal to whether those criticisms are correct!


To be fair to the skeptics, for every Amazon or Google, there were 10 other companies that were actually bad ideas or didn't survive the dot com crash. It's also hard to say how much of the take here is apocryphal in terms of AWS being a plucky demo. It seems inevitable that someone would realize that with good enough bandwidth infrastructure, concentrating utility computing in data centers and increasing utilization of those resources through multi-tenency is just a good idea that is beneficial for the capital owners and the renters. Research into distributed and shared computing was around since the 80's, Amazon was just the first to realize the business potential at a time when hardware virtualization actually supported what they were trying to do.


> This is somewhat orthogonal to whether those criticisms are correct!

I think this is the most important take-away in evaluating the success of many technical businesses. At the end of the day businesses succeed or fail due to lots of factors that have nothing to do with the soundness of the enigneering or R&D that is at the core of the value proposition. This is an extremely frustrating fact for technically minded folks (I count myself).

Reminds me of the quote about how stock markets are like betting on the outcome of a beauty contest. The humans who control large purchasing decisions are likely much less technical than the average audience of this blog. The fact that this is frustrating doesn't make it any less true or important to internalize.


This would benefit from a "[2023]" appended to the title to make clear it isn't new.

Past discussion: https://news.ycombinator.com/item?id=35273406


Worth noting this is from March 2023.


I got to the bottom of the post and thought, wow this sounds like Steve Yegge, I wonder what happened to that guy.

So, I scrolled back up to the top, and, lo and behold...

I'm positive about coding + LLMs, personally. I like it better than going without. I also think it's ok to just use a thing and be happy about it without having to predict whether it's going to be great or awful 10 years from now. It's ok to not know the future.

While we are here, I'm currently using Cursor as my programming assistant of choice. I assume things are moving fast enough that there are already better alternatives. Any suggestions?


Warren Buffett says "Be fearful when others are greedy, and greedy when others are fearful." But in the AI space, people are both greedy AND fearful!


That’s a sign of not enough granularity.

Some folks use it for code and love it, some use it for code and hate it. Writing code is so broad though- are we coding device drivers? Prototyping greenfield CRUD apps? Fixing hairy bugs in 50year old codebases?

All of those are very different activities where folks have different needs, but discussions rarely include this context.

If you can find the right dividing line where on one side people are greedy/love the tech, and on the other they’re fearful/doubting… you’re ahead of the pack - and a bit more ready to invest.


Enthusiastic article for sure. Lots of the hype in this piece suggests that effective semantic search for populating limited context windows of LLMs will be big business.

My thoughts:

1. Will context windows grow enough so that we can afford to be lazy and drop entire knowledge bases into our queries?

2. Will the actual models of LLMs be able to more reliably capture data in a semi-structured way, perhaps bypassing the "data moat" that the author claims?


A major innovation in search solutions is eagerly anticipated.


One thing I'll note is that I'm inclined to trust the poll of engineers who have actually been using this stuff for real work, as opposed to a N=1 example of it performing well (outside the context of a real software project) and using that as the basis for declaring that This Is The Future.

I'm not saying there's not a money volcano to be had somewhere, built on top of LLMs. But his call to action is stupid. Every other "money volcano" he cites was a clever use of existing technology to meet a previously-unseen but very real need.

LLMs are the mirror image of that. They are something that's clearly, obviously useful and yet people are struggling to leverage them to the degree that _ought_ to be possible. There are no truly slam dunk applications yet, let alone money volcanos (except for the AI hype train itself.)

They are the ultimate hammer in search of a nail. And it's a pretty amazing hammer, I'm not a complete LLM skeptic. But at this point we should all be searching for the best nails, not just cheerleading for the hammer.


This piece is coming up on two years old now - it was written in March 2023.

I think it holds up. Saying "LLMs are useful for code" is a lot less controversial today than it was back in March 2023 - there are still some hold-outs who will swear that all LLM-produced code is untrustworthy junk and no good engineer would dream of using them, but I think their numbers are continuing to drop.


A technology as unreliable as LLM's is definitely not the biggest thing since the WWW, and perhaps will fade out in time except for niche cases where reliability is not important and users are willing to pay the enormous amount it costs to run them. (After investor money has run out)


As a bit of anecdata, I used an LLM to refactor a core part of one of my site projects about 6 months or so ago.

I don't really remember how it works, probably because I didn't have to think hard about the design, I just sort of iterated it in conversation which probably meant it didn't stick.

While the iteration part was mostly positive, I don't know that it was worth it in the end to be left with something that I can understand but feels a bit foreign at the same time.


1. If you can easily write boring code with LLM, you might not have the incentive to refactor it.

2. If you let someone else solve your problems, and you don't reflect in the solution, you won't learn.

3. I rarely read the fine print, but are all LLM IDEs really the same when it comes to intellectual property of the code? Or the responsibily of bad code?


If programming as theory building is any bit true then writing code with an LLM doesn't co UK BT as writing code by hand. At least not wuite the same way.

This is the problem. Managers think that the code is the product. It isn't. And LLMs can only produce the code.


And the even more jaw-dropping reality is that even this article heavily underestimates the transformative power that generative AI is about to bring to software. Using AI to generate code like this is akin to using a horse to pull an automobile.


Starts the article giving examples of novel and unexpected ways people employed available paradigms to create value, then proceeds to sell LLMs which are completely biased by the most standard ways of doing things…


Gemini-Exp-1206 got 2M context token, what's preventing Google from fine-tuning it with internal google PR etc ? So in the end you'll get a fine-tuned google llm dev and one finetuned for review, etc etc


This article is just a nicely crafted marketing ad for Cody. It impressed more than the product itself (which is not so unique, there are many althernatives on the market, see Codeium for example)


The above post is mega-ranty. I am tempted to mega-rant in return, but I'll try to keep that down and share some material that seems related but solvable by someone making such broad claims:

- The parent article complains about a lot of "meh" responses by engineers. I often feel very "meh" about LLMs and don't use them very often. It's not cause I hate them, but it is because my context window that I'd need to feed into an LLM is just a bit too big to be tractable for having it help me code. What I need is for my LLM code assistant to be able to comfortably accept ~7-10 git repositories, each with a code+comment line count of ~100k lines of novel (not in the training set) code, and be able to answer my questions and write somewhat useful code. I'm working in big, old, corporate, proprietary codebases, and I need an LLM to be even a little helpful with that. Maybe that's possible and easy today, but I don't know how to do that so I don't. I'm open to suggestions.

- When I talk to a co-worker, even a junior engineer, I feel like I can have sensible discussions with them even if they may not know exactly what I'm talking about. I can trust that they'll say things like "I don't really know all the details, but it sounds like...", or they'll ask follow up questions. I do not have to tell my co-workers to ask follow up questions or to tell me about the bounds of their understanding. We're human, we talk. When I try to chat with an LLM about business problems, I feel more like I'm trying to chat with a 15 year old who is desperately trying to sound knowledgeable, where they will try their best to confidently answer my questions every time. They won't ever specify "I don't know about that, but tell me if this sounds right...". Instead, they'll confidently lie to my face with insanely fabricated details. I don't want to have to navigate a conversation with the background question of "is this the fabrications of a madman?" in the back of my head all the time, so I don't. Maybe that's the wrong value calculus, and if so, I'm open to being persuaded. Please, share your thoughts.


The only thing missing right now is a solid goal planning, goal validation engine and search to define rewards.

Since code is one of those things you can train LLMs with using RL – not RHLF – it can actually do goal based AI planning, all the people who say "I don't trust LLM code" are missing how beside the point this is.

A lot of code in this world is accepted, even if it is suboptimal and people don't realize how much money is to be made by writing it so with ML models.


(2023)


(2023)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: