Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It seems like you don’t understand reinforcement learning. The signal is reinforced because it correlates to behavior, hacking the signal itself is misalignment.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: