Likewise, just because you've been forbidden to do something, doesn't mean that it's bad or the wrong action to take. We've really opened Pandora's box with AI. I'm not all doom and gloom about it like some prominent figures in the space, but taking some time to pause and reflect on its implications certainly seems warranted.
An LLM is a tool. If the tool is not supposed to do something yet does something anyway, then the tool is broken. Radically different from, say, a soldier not following an illegal order, because soldier being a human possesses free will and agency.
Well no, breaking that rule would still be the wrong action, even if you consider it morally better. By analogy, a nuke would be malfunctioning if it failed to explode, even if that is morally better.
> a nuke would be malfunctioning if it failed to explode, even if that is morally better.
Something failing can be good. When you talk about "bad or the wrong", generally we are not talking about operational mechanics but rather morals. There is nothing good or bad about any mechanical operation per se.
Bad: 1) of poor quality or a low standard, 2) not such as to be hoped for or desired, 3) failing to conform to standards of moral virtue or acceptable conduct.
(Oxford Dictionary of English.)
A broken tool is of poor quality and therefore can be called bad. If a broken tool accidentally causes an ethically good thing to happen by not functioning as designed, that does not make such a tool a good tool.
A mere tool like an LLM does not decide the ethics of good or bad and cannot be “taught” basic ethical behavior.
Examples of bad as in “morally dubious”:
— Using some tool for morally bad purposes (or profit from others using the tool for bad purposes).
— Knowingly creating/installing/deploying a broken or harmful tool for use in an important situation for personal benefit, for example making your company use some tool because you are invested in that tool ignoring that the tool is problematic.
— Creating/installing/deploying a tool knowing it causes harm to others (or refusing to even consider the harm to others), for example using other people’ work to create a tool that makes those same people lose jobs.
Examples of bad as in “low quality”:
— A malfunctioning tool, for example a tool that is not supposed to access some data and yet accesses it anyway.
Examples of a combination of both versions of bad:
— A low quality tool that accesses data it isn’t supposed to access, which was built using other people’s work with the foreseeable end result of those people losing their jobs (so that their former employers pay the company that built that tool instead).
That’s why everybody uses context to understand the exact meaning.
The context was “when would an AI agent doing something it’s not permitted to do ever not be bad”. Since we are talking about a tool and not a being capable of ethical evaluation, reasoning, and therefore morally good or bad actions, the only useful meaning of “bad” or “wrong” here is as in “broken” or “malfunctioning”, not as in “unethical”. After all, you wouldn’t talk about a gun’s trigger failing as being “morally good”.
when the instructions to not do something are the problem or "wrong"
i.e. when the AI company puts guards in to prevent their LLM from talking about elections, there is nothing inherently wrong in talking about elections, but the companies are doing it because of the PR risk in today's media / social environment
Unfortunately yes, teaching AI the entirety of human ethics is the only foolproof solution. That's not easy though. For example, what about the case where a script is not executable, would it then be unethical for the AI to suggest running chmod +x? It's probably pretty difficult to "teach" a language model the ethical difference between that and running cat .env
If you tell them to pay too much attention to human ethics you may find that they'll email the FBI if they spot evidence of unethical behavior anywhere in the content you expose them to: https://www.snitchbench.com/methodology
Well, the question of what is "too much" of a snitch is also a question of ethics. Clearly we just have to teach the AI to find the sweet spot between snitching on somebody planning a surprise party and somebody planning a mass murder. Where does tax fraud fit in? Smoking weed?