> I've had Claude Code write an entire unit/integration test suite in a few hours (300+ tests) for a fairly complex internal tool. This would take me, or many developers I know and respect, days to write by hand.
I have no problem believing that Claude generated 300 passing tests. I have a very hard time believing those tests were all well thought out, consise, actually testing the desired behavior while communicating to the next person or agent how the system under test is supposed to work. I'd give very good odds at least some of those tests are subtly testing themselves (ex mocking a function, calling said function, then asserting the mock was called). Many of them are probably also testing implementation details that were never intended to be part of the contract.
I'm not anti-AI, I use it regularly, but all of these articles about how crazy productive it is skip over the crazy amount of supervision it needs. Yes, it can spit out code fast, but unless your prepared to spend a significant chunk of that 'saved" time CAREFULLY (more carefully than with a human) reviewing code, you've accepted a big drop in quality.
The benefit of having a team of QA engineers create tests is their differing perspectives, so with LLMs being trained to act like affirmation engines you have to wonder how that impacts the test cases it creates. Its the problem of LLMs being miserable at critiques manifesting itself in a different way.
However, in saying that, I am by no means an AI hater, but rather I just want models to be better than they currently are. I am tired of the tech demos and benchmark stats that don't really mean much aside from impressing someone who's not in a critical thinking mindset.
Very similar experience here. I have not once managed to get an LLM to generate good tests, even for very simple code. It generally writes tautologies that will pass with high confidence.
Anecdotes etc etc but the AI tests I've been sent to review have been absolute shit. Stuff like that just calling a function doesn't crash the program. No assertions other than "end of test method reached"
Yes sometimes those tests are necessary, but it seemed to just do it everywhere because it made the code coverage percentage go up. Even though it was useless.
I have also had great experiences with AI cranking out straightforward boilerplate or asking C++ template metaprogramming questions. It's not all negative. But net-net it feels like it takes more work in total to use AI as you have to learn to recognize when it just won't handle the task, which can happen a lot. And you need to keep up with what it did enough to be able to take over. And reading code is harder than writing it.
I’ve seen agents produce plenty of those tests, but recently I’ve seen them generate some actually decent unit tests that I wouldn’t have thought of myself. It’s a bit of a crapshoot
I had CC write a bunch of tests to make sure some refactoring didn't break anything, and then I ran the app and it crashed out of the gate. Why? Because despite the verbosity of the tests it turns out that it had mocked the most import parts to test, so the _actual_ connections weren't being tested, and while CC was happy to claim victory with all tests green, the app was broken.
So you're openly saying you're fine with quantity over quality.... in software engineering? That's fine for a MVP, maybe, but nothing beyond on that IMHO unless they're throw away scripts.
There is exactly one "best" programmer in the world, and at this moment he/she is working on at most one project. Every other project in the world is accepting less than the "best" possible quality. Yes... in software engineering.
As soon as you sat down at the keyboard this morning, your employer accepted a sacrifice in quality for the sake of quantity. So did mine. Because neither one of us is the best. They could have hired someone better but they hired you and they're fine with that. They'd rather have the code you produce today than not have it.
It's the same for an AI. It could produce some code for you, right now, for nearly free. Would you rather have that code or not have it? It depends on the situation, yeah not always but sometimes it's worth having.
I didn't intend to imply "best" even in the scope of a team, let alone every software engineer in the world. But, I understand your point and it's fair.
Here is the thing, most software engineers are not designing rockets, they are making basic CRUD apps. If there is a minor defect it can be caught and corrected without much issue. Our jobs are a lot less "critical infrastructure" than a lot of software engineers will allow their egos to accept.
Sure if you are making some medical surgery robot do it right, but if you are making a website the recommends wine pairings who cares if one of the buttons has a weird animation bug that doesn't even get noticed for a couple of years.
I think I'm "most" engineers and I haven't ever worked on something that was "just" a CRUD app. Having a DB behind your web app doesn't make it "just" a CRUD.
It's really overestimated how many simple apps exist.
Regular SaaS products of different kinds, cloud software, hosting software, etc. Really representative of most of the Web-enabled software out there.
For every one of them there has been an almost negligible amount of CRUD code, the meat of every one of those apps was very specific business logic. Some were also heavy on the frontend with equal amount of complexity on the backend. As a senior/staff level engineer you also have dive into other things like platform enablement, internal tooling, background jobs and data wrangling, distributed architectures, etc. which are even farther from CRUD.
Not to call you out but this is exactly what I meant when I said software engineers have egos that will not let them accept that they are not designing critical stuff.
Comparing your cloud based CRUD app to a missile is a perfect illustration. There is no dishonor in admitting that our stuff isn't going to kill anyone if there is a bug. Don't write bad code, but also sometimes just getting something out the door is much better than perfect quality (bird in the hand and all that).
Banking software is critical, but guess what, most software engineers are not writing banking software. I never said no software engineers write critical code. Heck I'd argue most at some point in their career will write something that needs to be as bug free as possible... at some point in their careers.
My point is that for most software engineering getting a product out is more important that a super high quality bar that slows everything down.
If you are writing banking software or flight control systems please do it with care, if you are making some React based recipe website or something I don't really care (99% of software engineering falls into this latter category in my opinion).
Software engineers need to get over themselves a bit, AI really exposed how many were just getting by making repetitive junk and thinking they were special.
> most software engineers are not writing banking software
Many software engineers write software for people who won't like the idea that their request/case can be ignored/failed/lost, when expressed openly on the front page of your business offering. Are bookings important enough? Are gifts for significant events important? Maybe you're okay with losing my code commits every once in a while, I don't know. And I'm not sure why you think it's okay to spread this bad management idea of "not valuable or critical enough" among engineers who should know better and who should keep sources of bad ideas at bay when it comes to software quality in general.
Not to call you out either but it seems you have really no idea what a basic CRUD app is. Which is fine, I guess not everyone likes to reads the base definitions of these things. It's clear I replied to the wrong person as we don't have a shared understanding of complexity.
I don't think zero unit tests is the right answer either. And if you actually take the time to read all 300 and cull the useless or overlapping ones, you've invested much more than 10% of the time it would have taken you.
Having a zillion unit tests (of questionable quality) is a huge pita when you try to refactor.
When I am writing unit tests (or other tests), I'm thinking about all the time I'll save by catching bugs early -- either as I write the test or in the future as regressions crop up. So to place too much importance on the amount of time invested now is missing the point, and makes me think that person is just going through the motions. Of course if I'm writing throwaway code or a POC, I'll probably skip writing tests at all.
In order to add coverage for scenarios that I haven't even thought of, I prefer fuzz testing. Then I get a lot more than 2-300 tests and I don't even pretend to spend time reviewing the tests until they fail.
If you want to use an LLM to help expedite the typing of tests you have thought of, fine. If you just tell it to write the suite for itself, that's equivalent to hiring a (mediocre to bad) new grad and forcing them to write tests for you. If that's as good of an outcome as doing it yourself, I can only assume you are brand new to software engineering.
The main benefit of writing tests is that is forces the developer to think about what they just wrote and what it is supposed to do. I often will find bugs while writing tests.
I've worked on projects with 2,000+ unit tests that are essentially useless, often fail when nothing is wrong, and rarely detect actual bugs. It is absolutely worse than having 0 tests. This is common when developers write tests to satisfy code coverage metrics, instead of in an effort to make sure their code works properly.
Hundreds of tests that were written basically for free in a few minutes even though a lot of them are kind of dumb?
Or hundreds of tests that were written for a five figure sum that took weeks or months, and only some of them are kind of dumb?
If you’re just thinking of code as the end in and of itself, then of course, the handcrafted artisanal product is better. If you think of code like an owner, an incidental expense towards solving a problem that has value, then cheap and disposable wins every time. We can throw our hands up about “quality“ and all that, but that baby was thrown out with the bathwater a very, very long time ago. The modern Web is slower than the older web. Desktop applications are just web browsers. Enterprise software barely works. Windows 11 happened. I don’t think anybody even bothers to scrutinize their dependency chains except for, I don’t know, like maybe missile guidance or something. And I just want to say Claude is not responsible for any of this. You humans are.
Neither. Tests should be written by developers only when it saves them time. The cost of writing them should be negative.
Instead of writing hundreds of useless tests so that the code coverage report shows high numbers, it is better to write a couple dozen tests based on business needs and code complexity.
Having used Bentley software products I can tell you with complete certainty that professional software developers have extremely bad judgment when it comes to the need to test software and verify its functionality. Developers just think they know what they’re doing because there’s typically not a strong feedback mechanism that inflicts serious career damage when they do things that are extremely lazy or stupid or unethical. How many people lost their job or had to change their name and live out the rest of their days in Juarez Mexico over AWS’ incomprehensible configuration causing an internet brown out? Anyone? A teenager serves cold onion rings at a burger joint and he’s on the street. Some lazy dweeb at Amazon blows up the internet and - come on, isn’t it about the friends we made along the way? It’s obscene and the lack of professionalism and accountability is a total disgrace.
> I've had Claude Code write an entire unit/integration test suite in a few hours (300+ tests) for a fairly complex internal tool. This would take me, or many developers I know and respect, days to write by hand.
I have no problem believing that Claude generated 300 passing tests. I have a very hard time believing those tests were all well thought out, consise, actually testing the desired behavior while communicating to the next person or agent how the system under test is supposed to work. I'd give very good odds at least some of those tests are subtly testing themselves (ex mocking a function, calling said function, then asserting the mock was called). Many of them are probably also testing implementation details that were never intended to be part of the contract.
I'm not anti-AI, I use it regularly, but all of these articles about how crazy productive it is skip over the crazy amount of supervision it needs. Yes, it can spit out code fast, but unless your prepared to spend a significant chunk of that 'saved" time CAREFULLY (more carefully than with a human) reviewing code, you've accepted a big drop in quality.