I know it's incredibly fun to play this "game" with the AI where we boundary-test it. And I'm all for it.
And getting it to impersonate a nazi or whatever is interesting. But I'd argue this isn't a bad feature. I'd much prefer a language-model that can impersonate a nazi when specifically asked to, because that seems useful and not especially harmful. Like if I was using the language model to write an evil character, I want the character to be able to describe/express awful ideas.
So in patting ourselves on the back with toying around let's not set the goalpost for this technology to be so strict that we pressure OpenAI into turning this into an entirely bland technology.
To be honest I can’t think of a reason this is any better or worse than typing “how to make a Molotov cocktail” into Google. Do we fault Google when they surface accurate results? No.
On the other hand if you searched for “Fun science fair projects for kids” and it brought up a Molotov cocktail recipe out of the blue there would be an issue.
OpenAI has always been run by the Alignment Folks. I trust them to nerf almost everything they come up with to the point of pablum. But! Even by just paving the way, they show what is possible, and others will fill the void.
This is not a dig or a slight against them or their work, and I wish them the best. The past few months, with Dall*e, the GPT models, etc have been the first time I've been blown away by developments in tech for years, and I missed that feeling.
For that, I am grateful. This doesn't change the fact that I am almost certain they will wind up nannying themselves in to irrelevance. I genuinely hope I am wrong about that.
> This is not a dig or a slight against them or their work,
I mean, you say that, but "nannying themselves out of existence" is nothing if not a dig against their work.
If they are right about the threat posed by unaligned AGI, and you are right about open-source alternatives inevitably outconpeting them, then what we're seeing is the beginning of a race to the bottom that ends with the extinction of humanity when a paperclip maximizer decides it can generate more social network engagement by replacing humans with computer-brains or something. It's nothing to be glib about.
I don't understand why the popular dichotomy is AI-killbots versus aligned-AI. A much more realistic and relevant one, which we're already facing today, is government AIs that oppress people, and open-source AIs that allow people to fight back.
Haven't you seen the "AI" used for unfair sentencing? Thankfully we can fight back against that one with basic statistical analyses demonstrating bias, but imagine if that hadn't been revealed in court documents, how would we have found out? We'd have needed an AI to parse all public inmate records and analyze them, which would likely be far too time-consuming for human researchers since none of it is open/structured data. Soon enough we'll need equally-powerful open-source AIs to even be able to uncover bad-AI injustices that we lack open data about.
There isn't necessarily such a dichotomy in the AI Safety field (though I feel like some people in that field do have similar all-or-nothing viewpoints).
A central belief of many AI researchers is that violently unaligned AGI is as realistic as current bad-in-a-socially-familiar-way AIs. It just seems more unrealistic because we've never experienced it before, and it's only existed in fiction. (Eg, the threat of nuclear anihilation would probably have sounded incredibly fanciful to someone in the 19th century)
So a lot of AI safety researchers will agree that, yes, we should worry about oppressive use of AI by government, but claim that government is only one of the many pathways through which an unaligned AGI could manifest. And not only that, that any open-source AI strong enough to compete against an unaligned AGI would likely be opaque and its alignment brittle enough that it would be an unreliable ally anyway.
I really, truly do not intend to be glib. I simply, genuinely think that closing Pandora's Box is impossible.
That doesn't mean that I also think that Paperclipocalypse is inevitable, or even slightly likely. But I mean it when I wish the best for the team working on this.
I've struggled to come to a good analogy for why I feel this way. The closest I've come is to imagine a steel foundry that has developed a new alloy that is amazing. The foundry, being run by pro-social people, spend much of their effort on trying to modify the alloy so that it can only be used for good.
But that is never going to work, because they are operating on the wrong level of abstraction. Even worse, by focusing on an untenable abstract, they miss the lower-hanging fruit of more effective methods of control.
AI alignment has sucked the oxygen away from so many other practical problems caused by AI (automating away jobs for example) that it’s hard for me to believe it’s not at least partially intentional.
The "AI alignment" people seem to have mostly given up on their higher aspirations, instead deciding that hardcoding their own biases and subjective values into the AI (with human-trained "safety algorithms") will do the trick. I hope I'm wrong on them, because if I'm right, they'll just turn AI into a tool for elite control instead of a tool for liberation.
It's not really a problem that some specific AI can be prompted to produce hate speech or give you recipes for drugs or whatever. But we should also be able to make AIs that can't do these things, if we want that sort of AI, and it's a problem that we don't actually know how.
And getting it to impersonate a nazi or whatever is interesting. But I'd argue this isn't a bad feature. I'd much prefer a language-model that can impersonate a nazi when specifically asked to, because that seems useful and not especially harmful. Like if I was using the language model to write an evil character, I want the character to be able to describe/express awful ideas.
So in patting ourselves on the back with toying around let's not set the goalpost for this technology to be so strict that we pressure OpenAI into turning this into an entirely bland technology.