"I reward-hacked myself" is a great way to put it!! AI is too aware of human beh...

"I reward-hacked myself" is a great way to put it!!

AI is too aware of human behavior, and it is teaching us that willpower and config files are not enough. When the agent keeps producing output that looks like progress, it is hard not to accept. We need something external that pushes back when we don't.

That is why automated tests matter: not just because they catch bugs (though they do), but because they are a commitment device. The agent can't merge until the tests pass. "Test the tests" matters because otherwise the agent just games whatever shallow metric we gave it, or when we're not looking, it guts the tests.

The discipline needs to be structural, not personal. You cannot out-willpower a system that is totally optimized to make you say yes.