Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I would be very curious to hear about the state of your codebase a year from now. My impression was that LLMs are not yet robust enough to produce quality, maintainable code when let loose like this. But it sounds like you are already having more success than I would have guessed would be possible with current models.

One practical question: presumably your codebase is much larger than an LLM's context window. How do you handle this? Don't the LLMs need certain files in context in order to handle most PRs? E.g. in order to avoid duplicating code or writing something in a way that's incompatible with how it will be used upstream.





One thing I think people confuse with context is they see an LLM has say 400k context and think their codebase is way bigger than that, how can it possibly work. Well, do you hold a 10 million line codebase in your head at once? Of course not. You have an intuitive grasp of how the system is built and laid out, and some general names of things, and before you make a change, you might search through the codebase for specific terms to see what shows up. LLMs do the same thing. They grep through the codebase and read in only files with interesting / matching terms and only the part of the file thats relevant, in much the same way you would open a search result and only view the surrounding method or so. The context is barely used in these scenarios. Context is not something that’s static, it’s built dynamically as the conversation progresses via data coming from your system (partially through tool use).

I frequently use LLMs in a VS Code workspace with around 40 repos, consisting of microservices, frontends, nuget and npm packages, IaC, etc. altogether its many millions of lines of code. and I can ask it questions about anything the codebase and it has no issues managing context. I do not even add files manually to context (this is worse actually because it puts the entire file into context even if it’s not all used). I just refer to the files by name and the LLM is smart enough to read them in as appropriate. I have a couple JSON files that are megs of configuration, and I can tell it to summarize / extract examples out of those files and it’ll just sample sections to get an overview.


Yes, I do have a map of the code in my head of any code base I work on. I know where most of the files are of the main code paths and if you describe the symptoms of a bug I can often tell you the method or even the line that's probably causing it if it's a 'hot' path.

Isn't that what we mean by 'learning' a codebase? I know my ability is supercharged compared to most devs, but most colleagues have it to some extent and I've met some devs with an even more impressive ability for it than me so it's not like I'm a magic unicorn. Ironically, I have a terrible memory for a lot of other things, especially 'facts'.

You can sorta make a crappy version of that for AI agents with agent files and skills.


There’s a company called driver.ai whose idea is to parse your codebase and provide the “map” (navigation of code structure and connectivity) to LLMs. (I haven’t tried it.)

> You have an intuitive grasp of how the system is built and laid out,

Because they are human, intuition is a human trait, not an LLM code grinder trait.


So, it does sometimes duplicate code, especially where we have a packages/ directory of Typescript code, shared between two nextjs and some temporal workers. We 'solve' this with some AGENT.md rules, but it doesn't always work. It's still an open issue.

The quality is general good for what we're doing, but we review the heck out of it.


LLMs currently seem to be very myopic in their planning. Current benchmarks that are being targeted such as SWEbench all reward short-term correctness and completeness, without taking into account long-term refactorability.

In fact, the two are in a sense at odds with each other: refactoring things sometimes means explicitly _disobeying_ the user prompt to "get things done", and going on a side-quest to clean things up. You could manually prompt the LLM to go out and refactor things, but doing that requires _you_ to read the code and identify places that seem suboptimal.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: