Humans don’t learn to write messy complex code. Messy, complex code is the default, writing clean code takes skill.
You’re assuming the LLM produces extra complexity because it’s mimicking human code. I think it’s more likely that LLMs output complex code because it requires less thought and planning, and LLMs are still bad at planning.
Totally agree with the first observation. The default human state seems to be confusion. I've seen this over and over in junior coders.
It's often very creative how junior devs approach problems. It's like they don't fully understand what they're doing and the code itself is part of the exploration and brainstorming process trying to find the solution as they write... Very different from how senior engineers approach coding when it's like you don't even write your first line until you have a clear high level picture of all the parts and how they will fit together.
About the second point, I've been under the impression that because LLMs are trained on average code, they infer that the bugs and architectural flaws are desirable... So if it sees your code is poorly architected, it will generate more of that poorly architected code on top. If it sees hacks in your codebase, it will assume hacks are OK and give you more hacks.
When I use an LLM on a poorly written codebase, it does very poorly and it's hard to solve any problem or implement any feature and it keeps trying to come up with nasty hacks... Very frustrating trial and error process; eats up so many tokens.
But when I use the same LLM on one of my carefully architected side projects, it usually works extremely well, never tries to hack around a problem. It's like having good code lets you tap into a different part of its training set. It's not just because your architecture is easier to build on top, but also it follows existing coding conventions better and always addresses root causes, no hacks. Its code style looks more like that of a senior dev. You need to keep the feature requests specific and short though.
> About the second point, I've been under the impression that because LLMs are trained on average code, they infer that the bugs and architectural flaws are desirable
This is really only true about base models that haven’t undergone post training. The big difference between ChatGPT and GPT3 was OpenAI’s instruct fine tuning. Out of the box, language models behave how you describe. Ask them a question and half the time they generate a list of questions instead of an answer. The primary goal of post training is to coerce the model into a state in which it’s more likely to output things as if it were a helpful assistant. The simplest version is text at the start of your context window like: “the following is code was written by a meticulous senior engineer”. After a prompt like that the most likely next tokens will never be the models imitation of a sloppy code. Instruct fine tuning does the same thing but as permanent modifications to the weights of the model.
You’re assuming the LLM produces extra complexity because it’s mimicking human code. I think it’s more likely that LLMs output complex code because it requires less thought and planning, and LLMs are still bad at planning.