Just to be clear, they weren't stupid 'is 1+1=2' type tests. I had the agent sca...

Just to be clear, they weren't stupid 'is 1+1=2' type tests.

I had the agent scan the UX of the app being built, find all the common flows and save them to a markdown file.

I then asked the agent to find edge cases for them and come up with tests for those scenarios. I then set off parallel subagents to develop the the test suite.

It found some really interesting edge cases running them - so even if they never failed again there is value there.

I do realise in hindsight it makes it sound like the tests were just a load of nonsense. I was blown away with how well Claude Code + Opus 4.5 + 6 parallel subagents handled this.