> I've been using AI to contribute to LLVM, which has a liberal policy.
This is a different decision made by the LLVM project than the one made by Gentoo, which is neither right nor wrong IMHO.
> The code is of terrible quality and I am at 100+ comments on my latest PR.
This may be part of the justification of the published Gentoo policy. I am not a maintainer of same so cannot say for certain. I can say it is implied within their policy:
At this point, they pose both the risk of lowering the
quality of Gentoo projects, and of requiring an unfair
human effort from developers and users to review
contributions ...
> LLMs increase review burden a ton ...
Hence the Gentoo policy.
> ... but I would say it can be a fair tradeoff, because I'm learning quicker and can contribute at a level I otherwise couldn't.
I get it. I really do.
I would also ask - of the requested changes reviewers have made, what percentage are due to LLM generated changes? If more than zero, does this corroborate the Gentoo policy position of:
Popular LLMs are really great at generating plausibly
looking, but meaningless content.
If "erroneous" or "invalid" where the adjective used instead of "meaningless"?
I would also ask - of the requested changes reviewers have made, what percentage are due to LLM generated changes? If more than zero, does this corroborate the Gentoo policy position of "Popular LLMs are really great at generating plausibly looking, but meaningless content."
I can only speak for my own PR, but most requested changes were related to formatting and other stylistic issues that I didn't fully grasp as a new LLVM contributor. e.g. Not wrapping at 80 characters, forgetting to declare stuff as const, or formatting the documentation incorrectly.
Previous codebases I've worked on during internships linted the first two in CI. And the documentation being formatted incorrectly is because I hand-wrote it without AI.
Out of the AI-related issues that I didn't catch, the biggest flaws were redundant comments and the use of string manipulation/parsing instead of AST manipulation. Useless comments are very common and I've gotten better at pruning them. The AI's insistence on hand-rolling stuff with strings was surprising and apparently LLVM-specific.
However, there was plenty of erroneous and invalid behaviour in the original AI-generated code, such as flagging `uint32_t` because the underlying type was an `unsigned int` (which wouldn't make sense as we want to replace `unsigned int` with `uint32_t`).
I prevented most of this from reaching the PR by writing good unit tests and having a clear vision of what the final result should look like. I believe this should be a basic requirement for trying to contribute AI-generated code to an open-source project but other people might not share the same belief.
Thank you for sharing your experiences in using this approach. They are ones which cannot be ascertained from PR's alone.
> However, there was plenty of erroneous and invalid behaviour in the original AI-generated code ...
> I prevented most of this from reaching the PR by writing good unit tests and having a clear vision of what the final result should look like.
This identifies an interesting question in my mind:
If an LLM code generator is used, is it better to use
it for generating production code and writing tests
to verify or write production code and use LLM
generated code to produce tests to verify?
Assuming LLM code generation, my initial answer is the approach you took as the test suite would serve as an augmentation to whatever prompt(s) used. But I could also see a strong case made for using LLM code test suite generation in order to maximize functional coverage.
Maybe this question would be a good candidate for an "Ask HN".
> I believe this should be a basic requirement for trying to contribute AI-generated code to an open-source project but other people might not share the same belief.
In my experience, LLMs are strongest when paired with an automated BS filter such as unit tests or linters. I use Cline and after every generation it reads VS Code's warnings & fixes them.
> If an LLM code generator is used, is it better to use it for generating production code and writing tests to verify or write production code and use LLM generated code to produce tests to verify?
I do both.
1. Vibe code the initial design with input on the API/architecture.
2. Use the AI to write tests.
3. Carefully scrutinize the test cases, which are much easier to review than the code.
4. Save both.
5. Go do something else and let the AI modify the code until the tests/linting/etc passes.
6. Review the final product, make edits, and create the PR.
The output of step 1 is guaranteed to be terrible/buggy and difficult to review for correctness, which is why I review the test cases instead because they provide concrete examples.
Step 5 eliminates most of the problems and frees me to review important stuff.
The whole reason I wrote the check is because AI keeps using `int` and I don't want it to.
This is a different decision made by the LLVM project than the one made by Gentoo, which is neither right nor wrong IMHO.
> The code is of terrible quality and I am at 100+ comments on my latest PR.
This may be part of the justification of the published Gentoo policy. I am not a maintainer of same so cannot say for certain. I can say it is implied within their policy:
> LLMs increase review burden a ton ...Hence the Gentoo policy.
> ... but I would say it can be a fair tradeoff, because I'm learning quicker and can contribute at a level I otherwise couldn't.
I get it. I really do.
I would also ask - of the requested changes reviewers have made, what percentage are due to LLM generated changes? If more than zero, does this corroborate the Gentoo policy position of:
If "erroneous" or "invalid" where the adjective used instead of "meaningless"?