> Let me now clarify - "silicon" does not have capabilities humans have relevant...

joe_the_user · 2025-07-17T04:03:03 1752724983

A well-trained LLM that lacks any malevolent data...

The scale needed to produce an LLM that is fluent enough to be convincing precludes fine-grained filtering of input data. The usual methods of controlling an LLM essentially involve a broad-brush "don't say stuff like that" (RLHF) that inherently misses a lot of subtlties.

And even more, defining malevolent data is extremely difficult. Therapists often go along with things a patient say because otherwise they break rapport. But therapists have to balk once the patient dives into destructive delusions. But data of a therapy can't be easily labeled with "here's where you have to stop", just to name one problem.

Dylan16807 · 2025-07-17T05:06:12 1752728772

So that's all true but the same argument works if you say a low percentage of malevolent data, and that's far from impossible.

joe_the_user · 2025-07-17T06:10:00 1752732600

It's remarkable how many people are uncritically talking of "malevolent data" as it is was a well-defined concept that everyone knows is the source of bad things.

A simple good search reveals ... this very thread as a primary source on the topic of "malevolent data" (ha, ha). But it should be noted that all other sources mentioning the phrase define it as data intentionally modified to produce a bad effect. It seems clear the problems of badly behaved LLMs don't come from this. Sycophancy, notably, doesn't just appear out of "sycophantic data" cleverly inserted by the association of allied sycophants.

Dylan16807 · 2025-07-17T06:22:53 1752733373

I don't find it very remarkable that when one person makes up a term that's pretty easy to understand, other people in the same conversation use the same term.

In the context of this conversation, it was a response to someone talking about malevolent human therapists, and worried about AIs being trained to do the same things. So that means it's text where one of the participants is acting malevolently in those same ways.

joe_the_user · 2025-07-17T19:49:25 1752781765

I suppose remarkable or not depends on viewpoint.

For me, hearing this fantastical talk of "malevolent data" is like hearing people who know little about chemistry or engines saying "internal combustion cars are fine long as we don't run them on 'carbon-filled-fuel'". Otherwise, see my comment above.

Dylan16807 · 2025-07-17T19:53:03 1752781983

I love your example there. Because there is an answer. There are ICE cars that run on hydrogen.

The thing they're talking about is hard but it's not impossible.

joe_the_user · 2025-07-18T01:32:28 1752802348

Sure, it's not literally impossible. There are ICE cars that run on hydrogen. But you can't practically adapt an existing gasoline car to run on hydrogen. My point is that mobilizing terminology gives people with no knowledge of details the illusion they can speak reasonably about the topic.

Dylan16807 · 2025-07-18T04:11:16 1752811876

> But you can't practically adapt an existing gasoline car to run on hydrogen.

You can do it pretty practically. Figuring out a supply is probably worse than the conversion itself.

> My point is that mobilizing terminology gives people with no knowledge of details the illusion they can speak reasonably about the topic.

"mobilizing terminology"? They just stuck two words together so they wouldn't have to say "training data that has the same features as a conversation with a malevolent therapist" or some similar phrase over and over. There's no expertise to be had, and there's no pretense of expertise either.

And the idea of filtering it out is understandable to a normal person: straightforward and a ton of work.

ClumsyPilot · 2025-07-17T05:39:26 1752730766

> A well-trained LLM that lacks any malevolent data

This is self-contradictory. An LLM must have malevolent data to identify malevolent intentions. A naive LLM will be useless. Might as well get psychotherapy from a child.

Once LLM has malevolent data, it may produce malevolent output. LLM does not inherently understand what is malevolence. It basically behaves like a psychopath.

You are trying to get a psychopath-like technology to do psychotherapy.

It’s like putting gambling addicts in charge of the world financial system, oh wait…

Dylan16807 · 2025-07-17T06:40:28 1752734428

I ask this with all sincerity, why is it important to be able to detect malevolent intentions from the person you're giving therapy to? (In this scenario, you cannot be hurt in any way.)

In particular, if they're being malevolent toward the therapy sessions I don't expect the therapy to succeed regardless of whether you detect it.

AdieuToLogic · 2025-07-17T04:31:00 1752726660

> A well-trained LLM that lacks any malevolent data, may well be better than a human psychopath who happens to have therapy credentials.

Interesting that in this scenario, the LLM is presented in its assumed general case condition and the human is presented in the pathological one. Furthermore, there already exists an example of an LLM intentionally made (retrained?) to exhibit pathological behavior:

  "Grok praises Hitler, gives credit to Musk for removing 'woke filters'"[0]

> And it may also be better than nothing at all for someone who is unable to reach a human therapist for one reason or another.

Here is a counterargument to "anything is better than nothing" the article posits:

  The New York Times, Futurism, and 404 Media reported cases 
  of users developing delusions after ChatGPT validated 
  conspiracy theories, including one man who was told he 
  should increase his ketamine intake to "escape" a 
  simulation.

> Where for years I heard many people making the same mistake you're making, of saying that silicon could never demonstrate the flair and creativity of human chess players; that turned out to be false.

Chess is a game with specific rules, complex enough to make optimal strategy exhaustive searches infeasible due to exponential cost, yet it exists in a provably correct mathematical domain.

Therapy shares nothing with this other than the time it might take a person to become an expert.

0 - https://arstechnica.com/tech-policy/2025/07/grok-praises-hit...

Dylan16807 · 2025-07-17T06:37:58 1752734278

> Interesting that in this scenario, the LLM is presented in its assumed general case condition and the human is presented in the pathological one.

They were replying to a comment comparing a general case human and a pathological LLM. So yeah, they flipped it around as part of making their point.