Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

LLMs are doing what you train them to do. See for example " The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions " by Eric Wallace et al.


Interesting. Doesn't solve the problem entirely but seems to be a viable strategy to mitigate it somewhat.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: