Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

On multiple occasions, Claude Code claims it completed a task when it actually just wrote mock code. It will also answer questions with certainity (for e.g. where is this value being passed), but in reality it is making it up. So if you haven't been seeing hallucinations on Opus/Sonnet, you probably aren't looking deep enough.


This is because you haven't given it a tool to verify the task is done.

TDD works pretty well, have it write even the most basic test (or go full artisanal and write it yourself) first and then ask it to implement the code.

I have a standing order in my main CLAUDE.md to "always run `task build` before claiming a task is done". All my projects use Task[0] with pretty standard structure where build always runs lint + test before building the project.

With a semi-robust test suite I can be pretty sure nothing major broke if `task build` completes without errors.

[0] https://taskfile.dev


What do you think it is 'mocking'? It is exactly the behavior that would make the tests work. And unless I give it access to production, it has no way to verify tasks like how values (in this case secrets/envs) are being passed.

Plus, this is all besides the point. Simon argued that the model hallucinates less, not a specific product.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: