How is this different from any legacy code base? Everywhere I've worked there's been swaths of code where the initial intent is lost to time and debugging them is basically archeology.
Random AI code isn't going to be categorically different from random code copied from StackOverflow, or code simply written a few years ago. Reading code will always be harder than writing it.
Legacy code is written by humans, and when you read stuff written by humans you can anticipate what their intent was because you are a human as well.
With AI, there is no intent, the AI is a probabilistic model and you have to guess if the model got it right.
It’s like the difference between driving next to a human driver vs a self driving car. You have some idea of what the human will do but the self driving car could flip out at any moment and do something irrational.
> Legacy code is written by humans, and when you read stuff written by humans you can anticipate what their intent was because you are a human as well.
Well, no. This is both technically categorically false (because you can only "anticipate" something in advance, and the intent is in the past by the time you are reading the code, though you might in principal infer their intent), and often practically false even when read with "infer" in place of "anticipate", as it is quite common for the intent of human-written code to be non-obvious, except in the sense of "the purpose of a system is what it does", in which sense AI-written code is not particularly more opaque.
No, in time it’s going to look just like that but with hallucinations and large areas of the codebase no human has even read. No doubt digging through code and trying to understand will involve a lot more prompting back and forth
> No, in time it’s going to look just like that but with hallucinations and large areas of the codebase no human has even read.
An area of code that no human has ever read is no different to me than an area of code lots of humans who aren't me have read, and that's not that different from an area of code that I have read, but not particularly recently.
I mean, except for who it is that I am going to be swearing about when it breaks and I finally do read it.
An area of code that’s been read a lot is likely to be like a paved road. Sure, it could have cracks or mud but it’s generally understood and really bad warts fixed or warning signs left behind. An area of code that runs rarely and is inspected even more rarely is likely to have bring more surprises and unexpected behaviors.
> How is this different from any legacy code base? Everywhere I've worked there's been swaths of code where the initial intent is lost to time and debugging them is basically archeology.
I will concede your point even though I disagree with it for the sake of this argument - specifically, that it's not different than legacy code bases (I believe it is, because they were written by humans with much more business and code context than any LLM on the market can currently consume) - then why use AI at all if it isn't very much different?
I can say from my actual experience and specialty, which is dissecting spaghetti'd codebases (more from a cloud infrastructure perspective, which is where much of my career has been focused on), that any kind of lost knowledge in legacy codebases usually presents themselves in clues or just asking some basic questions from the business owners that do remain. someone knows something about the legacy code that's running the business, whether that's what it's for or what the original purpose was, and I don't realistically expect an LLM/chatbot/AI will ever be able to sus that out like a human could, which involves a lot of meetings and talking to people (IME). This is just based on my experience untangling large codebases where the original writers had been gone for 5+ years. From my perspective expecting an AI to maintain huge balls of mud is much more likely to result in bigger piles of mud than they originally generated - I don't see how it can logically follow that they'll somehow improve the ball of mud such that it no longer is a ball of mud. They're especially prone to this because of their 100% agreeability. And given the current strategy of just using increasingly larger context windows and more compute to account for these types of problems - I don't see how expecting an AI to maintain a huge ball of mud for a long time is realistically feasible. every line of code and related business context then adds to the cost exponentially in a way that doesn't happen when just hiring some fresh junior with a lot of salt and vinegar and get-up wouldn't also solve with enough determination.
A common thing that comes up in SWE is that the business asks for something that's either stupid, unreasonable, or a huge waste of time and money - Senior engineers know when to say no or the spots to push back. LLM's and the cleverest "prompt engineers" simply don't and I don't see any world where this gets better, again, due to the agreeability issues. I also don't see or understand a world where these same AI engineers can communicate to the business the constraints or timelines in a way that makes sense. I don't expect this to improve, because every business tends to have its unique problems that can't simply be trained on from some training set.
Random AI code isn't going to be categorically different from random code copied from StackOverflow, or code simply written a few years ago. Reading code will always be harder than writing it.