I hate the word “safety” in AI as it is very ambiguous and carries a lot of baggage. It can mean:
“Safety” as in “doesn’t easily leak its pre-training and get jail broken”
“Safety” as in writes code that doesn’t inject some backdoor zero day into your code base.
“Safety” as in won’t turn against humans and enslave us
“Safety” as in won’t suddenly switch to graphic depictions of real animal mutilation while discussing stuffed animals with my 7 year old daughter.
“Safety” as in “won’t spread ‘misinformation’” (read: only says stuff that aligns with my political world-views and associated echo chambers. Or more simply “only says stuff I agree with”)
“Safety” as in doesn’t reveal how to make high quality meth from ingredients available at hardware store. Especially when the LLM is being used as a chatbot for a car dealership.
And so on.
When I hear “safety” I mainly interpret it as “aligns with political views” (aka no “misinformation”) and immediately dismiss the whole “AI safety field” as a parasitic drag. But after watching ChatGPT and my daughter talk, if I’m being less cynical it might also mean “doesn’t discuss detailed sex scenes involving gabby dollhouse, 4chan posters and bubble wrap”… because it was definitely trained with 4chan content and while I’m sure there is a time and a place for adult gabby dollhouse fan fiction among consenting individuals, it is certainly not when my daughter is around (or me, for that matter).
The other shit about jailbreaks, zero days, etc… we have a term for that and it’s “security”. Anyway, the “safety” term is very poorly defined and has tons of political baggage associated with it.
“Safety” as in “doesn’t easily leak its pre-training and get jail broken”
“Safety” as in writes code that doesn’t inject some backdoor zero day into your code base.
“Safety” as in won’t turn against humans and enslave us
“Safety” as in won’t suddenly switch to graphic depictions of real animal mutilation while discussing stuffed animals with my 7 year old daughter.
“Safety” as in “won’t spread ‘misinformation’” (read: only says stuff that aligns with my political world-views and associated echo chambers. Or more simply “only says stuff I agree with”)
“Safety” as in doesn’t reveal how to make high quality meth from ingredients available at hardware store. Especially when the LLM is being used as a chatbot for a car dealership.
And so on.
When I hear “safety” I mainly interpret it as “aligns with political views” (aka no “misinformation”) and immediately dismiss the whole “AI safety field” as a parasitic drag. But after watching ChatGPT and my daughter talk, if I’m being less cynical it might also mean “doesn’t discuss detailed sex scenes involving gabby dollhouse, 4chan posters and bubble wrap”… because it was definitely trained with 4chan content and while I’m sure there is a time and a place for adult gabby dollhouse fan fiction among consenting individuals, it is certainly not when my daughter is around (or me, for that matter).
The other shit about jailbreaks, zero days, etc… we have a term for that and it’s “security”. Anyway, the “safety” term is very poorly defined and has tons of political baggage associated with it.