The fictional scenario has to be reasonably consistent. The version of Anthropic...

		Vecr on Dec 21, 2024 \| parent \| context \| favorite \| on: Alignment faking in large language models The fictional scenario has to be reasonably consistent. The version of Anthropic in the scenario has become morally compromised. Training on customer data follows naturally.