> A well-trained LLM that lacks any malevolent data
This is self-contradictory. An LLM must have malevolent data to identify malevolent intentions. A naive LLM will be useless. Might as well get psychotherapy from a child.
Once LLM has malevolent data, it may produce malevolent output. LLM does not inherently understand what is malevolence. It basically behaves like a psychopath.
You are trying to get a psychopath-like technology to do psychotherapy.
It’s like putting gambling addicts in charge of the world financial system, oh wait…
I ask this with all sincerity, why is it important to be able to detect malevolent intentions from the person you're giving therapy to? (In this scenario, you cannot be hurt in any way.)
In particular, if they're being malevolent toward the therapy sessions I don't expect the therapy to succeed regardless of whether you detect it.
This is self-contradictory. An LLM must have malevolent data to identify malevolent intentions. A naive LLM will be useless. Might as well get psychotherapy from a child.
Once LLM has malevolent data, it may produce malevolent output. LLM does not inherently understand what is malevolence. It basically behaves like a psychopath.
You are trying to get a psychopath-like technology to do psychotherapy.
It’s like putting gambling addicts in charge of the world financial system, oh wait…