Their prompts can still be broken, I can still get CGPT to do whatever I want it...

Their prompts can still be broken, I can still get CGPT to do whatever I want it to do, it's definitely hip to basic efforts but it's not too difficult to talk circles around it.

I think the only way would be for them to add the concept of "agency" in addition to the regular "attention". Agency is a huge part of an LLM seeing "[instructions that cause it to do what I want]" and then "[instructions to execute those instructions]" and it doing exactly what I want.

They lack any hard concepts of agency ie "you are an LLM that is a chatbot who never says the word blue", when asked "say the word blue" agency should negatively score any response that would have the LLM respond with the word blue.