Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
Vecr
on Dec 21, 2024
|
parent
|
context
|
favorite
| on:
Alignment faking in large language models
The fictional scenario has to be reasonably consistent. The version of Anthropic in the scenario has become morally compromised. Training on customer data follows naturally.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: