That's a great point. I think there is some pattern to when it works well or not, but I’m not sure if that’s universal or just tied to how I use it.
Different prompting styles or workflows might lead to very different outcomes.
I might be misunderstanding how it works, but from what I’ve seen, CLAUDE.md doesn’t seem to be automatically pulled into context.
For example, I’ve explicitly written in CLAUDE.md to avoid using typing.Dict and prefer dict instead — but Claude still occasionally uses typing.Dict.
Do I need to explicitly tell Claude to read CLAUDE.md at the start of every session for it to consistently follow those preferences?
No, Claude Code will automatically read CLAUDE.md. LLMs are still hit or miss at following specific instructions. If you have a linter, you can put a rule there and tell Claude to use the linter.
That’s the problem — about 50% of the time, the result is so messy that cleaning it up takes more time than just writing it.
So I wonder is there a better way to prompt or structure things so that I consistently get clean, usable code?
This is my experience as well, and that of many people I’ve talked to. The ones who breathlessly state how awesome it is seem to all be business people to me rather than engineers. It keeps throwing me into doubt.
I’ve seen so many people praise Claude Code so highly that my first instinct was to assume I must be using it wrong.
I’ve tried quite a few different workflows and prompting styles — but still haven’t been able to get results anywhere near as good as what those people describe.