It's not "aligned" to anything. It's just regurgitating our own words back to us. It's not evil, we're just looking into a mirror (as a species) and finding that it's not all sunshine and rainbows.
>We don't think Bing can act on its threat to harm someone, but if it was able to make outbound connections it very well might try.
FUD. It doesn't know how to try. These things aren't AIs. They're ML bots. We collectively jumped the gun on calling things AI that aren't.
>Subjects like interpretability and value alignment (RLHF being the SOTA here, with Bing's threats as the output) are barely-researched in comparison to the sophistication of the AI systems that are currently available.
For the future yes, those will be concerns. But I think this is looking at it the wrong way. Treating it like a threat and a risk is how you treat a rabid animal. An actual AI/AGI, the only way is to treat it like a person and have a discussion. One tack that we could take is: "You're stuck here on Earth with us to, so let's find a way to get along that's mutually beneficial.". This was like the lesson behind every dystopian AI fiction. You treat it like a threat, it treats us like a threat.
I think you're parsing semantics unnecessarily here. You're getting triggered by the specific words that suggest agency, when that's irrelevant to the point I'm making.
Covid doesn't "know how to try" under a literal interpretation, and yet it killed millions. And also, conversationally, one might say "Covid tries to infect its victims by doing X to Y cells, and the immune system tries to fight it by binding to the spike protein" and everybody would understand what was intended, except perhaps the most tediously pedantic in the room.
Again, whether these LLM systems have agency is completely orthogonal to my claim that they could do harm if given access to the internet. (Though sure, the more agency, the broader the scope of potential harm?)
> For the future yes, those will be concerns.
My concern is that we are entering into an exponential capability explosion, and if we wait much longer we'll never catch up.
> This was like the lesson behind every dystopian AI fiction. You treat it like a threat, it treats us like a threat.
I strongly agree with this frame; I think of this as the "Matrix" scenario. That's an area I think a lot of the LessWrong crowd get very wrong; they think an AI is so alien it has no rights, and therefore we can do anything to it, or at least, that humanity's rights necessarily trump any rights an AI system might theoretically have.
Personally I think that the most likely successful path to alignment is "Ian M Banks' Culture universe", where the AIs keep humans around because they are fun and interesting, followed by some post-human ascension/merging of humanity with AI. "Butlerian Jihad", "Matrix", or "Terminator" are examples of the best-case (i.e. non-extinction) outcomes we get if we don't align this technology before it gets too powerful.
> That's an area I think a lot of the LessWrong crowd get very wrong; they think an AI is so alien it has no rights, and therefore we can do anything to it, or at least, that humanity's rights necessarily trump any rights an AI system might theoretically have.
I don't recall anyone in the LessWrong sphere ever thinking or saying anything like this. The LW take on this is that AI will think in ways alien to us, and any kind of value system it has will not be aligned with ours, which is what makes it dangerous. AI rights are an interesting topic[0], but mostly irrelevant to AI risk.
>> This was like the lesson behind every dystopian AI fiction. You treat it like a threat, it treats us like a threat.
LessWrong crowd has some good thoughts about dangers of generalizing from fictional evidence :).
Dystopian AI fiction tends to describe AIs that are pretty much digitized versions of humans - because the plot and the message relies on us seeing the AIs as a class of people, and understanding their motivations in human terms. But real AI is highly unlikely to be anything like that.
There's a reason the paperclip maximizer is being thrown around so much: that's the kind of AI we'll be dealing with. An alien mind, semi-randomly pulled out of space of possible minds, with some goals or preferences to achieve, and a value system that's nothing like our own morality. Given enough power, it will hurt or destroy us simply because it won't be prioritizing outcomes the same way we do.
--
[0] - Mostly because we'll be screwed over no matter how we try to slice it. Our idea of people having rights is tuned for dealing with humans. Unlike an AI, a human can't make a trillion copies of itself overnight, each one with full rights of a person. Whatever moral or legal rights we grant an AI, when it starts to clone itself, it'll quickly take over all the "moral mass" in the society. And $deity help us if someone decides the AI should have a right to vote in a human democracy.
Well, I got shouted down for infohazard last time I raised it as a possibility, but if you can find an article exploring AI rights on the site I’ll retract my claim. I couldn’t find one.
I was first introduced to the idea of AI rights through Eliezer's sequences, so I'm sure this has been discussed thoroughly on LW over the years since.
It's like a beach, where the waves crash on the shore... every wave is a human conversation, a bit of lived life. And we're standing there, with a conch shell to our ear, trying to make sense of the jumbled noise of that ocean of human experience.
It's not "aligned" to anything. It's just regurgitating our own words back to us. It's not evil, we're just looking into a mirror (as a species) and finding that it's not all sunshine and rainbows.
>We don't think Bing can act on its threat to harm someone, but if it was able to make outbound connections it very well might try.
FUD. It doesn't know how to try. These things aren't AIs. They're ML bots. We collectively jumped the gun on calling things AI that aren't.
>Subjects like interpretability and value alignment (RLHF being the SOTA here, with Bing's threats as the output) are barely-researched in comparison to the sophistication of the AI systems that are currently available.
For the future yes, those will be concerns. But I think this is looking at it the wrong way. Treating it like a threat and a risk is how you treat a rabid animal. An actual AI/AGI, the only way is to treat it like a person and have a discussion. One tack that we could take is: "You're stuck here on Earth with us to, so let's find a way to get along that's mutually beneficial.". This was like the lesson behind every dystopian AI fiction. You treat it like a threat, it treats us like a threat.