Thank you for sharing your experiences in using this approach. They are ones which cannot be ascertained from PR's alone.
> However, there was plenty of erroneous and invalid behaviour in the original AI-generated code ...
> I prevented most of this from reaching the PR by writing good unit tests and having a clear vision of what the final result should look like.
This identifies an interesting question in my mind:
If an LLM code generator is used, is it better to use
it for generating production code and writing tests
to verify or write production code and use LLM
generated code to produce tests to verify?
Assuming LLM code generation, my initial answer is the approach you took as the test suite would serve as an augmentation to whatever prompt(s) used. But I could also see a strong case made for using LLM code test suite generation in order to maximize functional coverage.
Maybe this question would be a good candidate for an "Ask HN".
> I believe this should be a basic requirement for trying to contribute AI-generated code to an open-source project but other people might not share the same belief.
In my experience, LLMs are strongest when paired with an automated BS filter such as unit tests or linters. I use Cline and after every generation it reads VS Code's warnings & fixes them.
> If an LLM code generator is used, is it better to use it for generating production code and writing tests to verify or write production code and use LLM generated code to produce tests to verify?
I do both.
1. Vibe code the initial design with input on the API/architecture.
2. Use the AI to write tests.
3. Carefully scrutinize the test cases, which are much easier to review than the code.
4. Save both.
5. Go do something else and let the AI modify the code until the tests/linting/etc passes.
6. Review the final product, make edits, and create the PR.
The output of step 1 is guaranteed to be terrible/buggy and difficult to review for correctness, which is why I review the test cases instead because they provide concrete examples.
Step 5 eliminates most of the problems and frees me to review important stuff.
The whole reason I wrote the check is because AI keeps using `int` and I don't want it to.
> However, there was plenty of erroneous and invalid behaviour in the original AI-generated code ...
> I prevented most of this from reaching the PR by writing good unit tests and having a clear vision of what the final result should look like.
This identifies an interesting question in my mind:
Assuming LLM code generation, my initial answer is the approach you took as the test suite would serve as an augmentation to whatever prompt(s) used. But I could also see a strong case made for using LLM code test suite generation in order to maximize functional coverage.Maybe this question would be a good candidate for an "Ask HN".
> I believe this should be a basic requirement for trying to contribute AI-generated code to an open-source project but other people might not share the same belief.
FWIW, I completely concur.