I haven't read the actual paper linked in the article, but I don't think that either emotions such as worry or any kind of self-awareness need to exist within these models to explain what is happening here. From my understanding LLMs are essentially trained to imitate the behavior of certain archetypes. "ai attempts to trick its creators" is a common trope. There is probably enough rogue ai and ai safety content in the training data, for this become part of the ai archetype within the model. So if we provide the system with a prompt telling it that it is an ai, it makes sense for it to behave in the way described in the article, because that is what we'd expect an ai to do