You can audit binary code with tools like Ghidra and IDA Pro.
It takes a different mindset to find these type of bugs than it takes to develop software. I won't quite say they're orthogonal skill sets, but pretty close.
If the people finding these bugs don't want to work for Apple, Google Project Zero, etc. there's not really much Apple can do about it.
I mean that's part of the conversation that needs to be had. I would argue libraries are an unadulterated good, but it is generally considered at best unethical and at worst illegal to re-use content that isn't your own, at least without a proper citation.
Then there's also the issue with things like art, music, and code. Where does the line fall with scraping Github, Soundcloud, DeviantArt, or Instagram and using things like that without permission? Most of the code on Github is open source, but there's a lot of difference between the GPL and BSD licenses.
> but it is generally considered at best unethical and at worst illegal to re-use content that isn't your own, at least without a proper citation.
No it's not at all, except in extremely limited circumstances.
When George Lucas made Star Wars, did he cite all the Westerns and space opera serials and movies that influenced him? When you give a presentation at work on why you should move to a sharded database, do you cite the history of academic work on sharded databases? When you use Times New Roman in a document, do you cite the British newspaper The Times, or Robert Granjon's prior serif designs from the 1500's?
Of course not.
Legally, you can do whatever you want with ideas and styles and whatnot, which is what AI is about. Legally, you only run into problems when you reproduce sections of copyrighted works verbatim, without a license, in a manner that's not considered fair use. Your answer to "where does the line fall" is quite clear legally -- it's the line demarcated by fair use, which has nothing to do with licenses. AI doesn't change that.
I am not a lawyer, but it seems right to me to say that the weights are a derivative work of the training set.
> A “derivative work” is a work based upon one or more preexisting works, such as a translation, musical arrangement, dramatization, fictionalization, motion picture version, sound recording, art reproduction, abridgment, condensation, or any other form in which a work may be recast, transformed, or adapted. A work consisting of editorial revisions, annotations, elaborations, or other modifications, which, as a whole, represent an original work of authorship, is a “derivative work”.
As I understand it, derivative works must be created with the legal use of the original work, or be fair use, otherwise they are infringing.
No, as you can see from your very definition. But here's a good example:
If you take a book and turn it into a movie, that's a derivative work. Anyone can see the direct resemblance -- the transformation or adaptation.
But if you take a book, convert each letter to a number, add up the numbers that make each sentence, and then sell that as a list of "random" numbers, that's not a derivative work. The end result is sufficiently transformed that copyright no longer applies. Ownership of the original work has no relevance.
And AI weights are like that. They're a complete transformation. They're not a derivate work. The only thing you have to make sure of is that they haven't been overtrained to the extent that they can regurgitate whole chapters of the texts they were trained on, for example. But that's not something they're currently able to do, and obviously copyright law will force companies to ensure it stays that way. (Not to mention that companies would do it anyways, due to the economic motivation of reducing model sizes to cut costs.)
>convert each letter to a number, add up the numbers that make each sentence...The end result is sufficiently transformed that copyright no longer applies
the problem with this as an example is that copyright would not apply to this transformative work, not the original author's copyright nor your new authorship because this transformative work contains no creative human expression (unless the original book was designed to add up to some fortune cookie, of course, in which case you have not transformed it)
A nuttier, chewier example would be retelling a litigious story like Moana ("consider the copyright, across all these leaves... make way!"), from the pig's perspective or something, and seeing what would fly and what wouldn't.
Weights are simply a lossy compression of the training data set.
Now, I understand the argument that perhaps the specific work has been homeopathically diluted down to nothingness in the weights and so therefore has only been used to contextualise the compression process of other works, but if the weights can be reasonably used to generate copyright infringing text (and condensations and abridgements and transformations are explicitly listed in the law, verbatim copying is not necessary), or even answer substantial questions about it, then that shows that the weights included that data.
If I take a sound file and compress it down so it's poor quality but I can still make out the tune, that doesn't mean that I've avoided copyright law.
> Weights are simply a lossy compression of the training data set.
No they're not -- they're more like the dictionary generated to produce a lossless compressed data set. But then we throw out the compressed data itself, and keep only the dictionary.
> but if the weights can be reasonably used to generate copyright infringing text (and condensations and abridgements and transformations are explicitly listed in the law, verbatim copying is not necessary)
First of all, they haven't been shown to substantially generate infringing text that aren't the kinds of short snippets covered by fair use. And my previous comment already explained that longer texts are not going to happen, for both legal and economic reasons.
But secondly, you're wrong about "condensations and abridgements and transformations". You can absolutely sell a page-long summary of a book without getting permission, for instance. What do you think things like CliffsNotes are all about? Or all those two-page "executive summaries" of popular busines books?
You can't abridge a 1,000 page book to 500 pages and sell that, but you can summarize its ideas in a page and sell that. Which is basically the approximate level of understanding that LLM's seem to absorb.
I will say that Oracle does contribute to gcc, gdb, and other parts of the GNU tool chain. I interviewed a few years ago with the team that does it. I don’t know how large the contributions are, but they seem super passionate about what they do and believe strongly in giving back
They might have an occasional commit or two but clearly they can't stand behind their own promise of developing/supporting an EL distro the way Red hat does. I also don't see it changing tbh. I don't see troves of Open source engineers at Red Hat(or other companies) making a bee line for joining Oracle.
> Apple solves small, annoying problems, and does so without friction. Things just work.
This. I’ve tried Android a couple of times over the years, and every time there’s just little quirks and problems that drive me back to iOS.
Same with Linux on a laptop. I actually prefer Linux as a dev environment, but I’ve never been able to get the same laptop experience as I can with a Mac. The quality of the touchpad, keyboard (the butterfly keys being the lone exception), display, weight, and form factor just can’t be touched.
I don’t think they are calling “rationalism” itself bad, but there is a fairly large contingent of people who call themselves “rationalists” who don’t show the slightest hint of self introspection or rational thought. They see everyone else as driven by sloppy or emotional thinking and completely miss that their own arguments and reasoning are sloppy or emotion driven.
The first step to being a true rationalist is to realize you’re as vulnerable to cognitive biases and emotion driven thinking as everyone else, and focus on your own thought processes first and foremost.
Being a rationalist isn’t about lording over other people, it’s about trying to make your own thought as clear and rational as possible, and that requires challenging your own deeply held beliefs and opinions constantly.
Most people I see who call themselves rationalists aren’t that.
This makes a lot of assumptions. Space is ridiculously big, and rather hostile to life, even artificial life.
You first have to survive long enough to become advanced enough to make electronics. You then have to not kill yourselves with nuclear weapons, climate change, or similar inadvertent effects of a rapidly industrializing civilization.
The planet and the solar system have to be friendly enough to space exploration and travel. Maybe there’s no gas giants for gravitational slingshots, or maybe no other rocky planets or an asteroid belt for mining materials.
Maybe the planet evolved complex life in extreme conditions, with such a deep cloud cover there’s no concept of outer space, so as far as the AI knows it’s conquered all there is.
Maybe the AI conquered the planet, but oops, there goes a super volcano or an asteroid and it gets wiped out.
And again… space is really really big. The AI may be on its way and just hasn’t gotten here yet.
There’s plenty of reasons why a super AI wouldn’t be able to conquer the galaxy and beyond, or why we haven’t noticed yet.
How is “insect flour” misleading? The word “insect” is right there in the name! And who cares if it’s normalized or not? If it’s useful and makes healthy, delicious food, I don’t care what it’s made out of.
Like… you do realize the parts of animals we eat are pretty gross right? The things that go into sausages and such? Bone marrow is a delicacy for gods sake!
There’s foods that I think are gross and won’t eat… calimari for instance… but they usually come down to sensory things like taste, texture or smell. The ingredients don’t usually come into play.
I haven't been using it for much besides simple problems that I don't feel like trawling through SO or banging my head against for 30 minutes. Things like shell one liners for text processing/searching files/etc.
On larger tasks, I've not found it particularly useful, although I haven't had a chance to try it out with GPT-4. Previously, when I would ask ChatGPT about solving a particular problem, it would be terribly broken. Maybe GPT-4 is better.
That said... even though the code was broken, it was helpful in that it gave me a skeleton of what a solution would look like, especially if it was a problem domain I had no experience in.
For example, I wanted to do a little project to extract text from PDFs, including PDFs that were basically image scans, so I would have to do some kind of OCR. I'd never done anything like this before. I'm sure I could spend time Googling and figuring out which libraries to use. But instead I asked ChatGPT.
The solution it gave wasn't great, but more importantly it pointed me in the right direction with the libraries it used and some examples on how to use it.
Aside from programming, I've also used it as a "study buddy" since I'm going back to school and working on my masters in Computer Science. That's been much more successful. For example, I will give it questions from study materials handed out by the instructor (like previous exams or quizzes) and say "We are reviewing paper X in this class. Here's questions from a previous exam. Please generate questions like this to help me prepare for my upcoming exam."
or "Here are questions from a previous exam and my answers. Please evaluate my answers and provide feedback."
or "Here are questions from a previous exam, please quiz me in a similar format"
Also working on projects for class, while I won't ask it to solve the problem for me, sometimes I'll bounce ideas off of it. Like... "I know there's an algorithm to do X, but I don't know the name of it. I don't want you to write the algorithm for me, because that's cheating, but please tell me what the algorithm is called and if possible point me to a good paper describing it."
Lastly, I recently used it while helping someone update their resume (with permission). I removed all personal information and asked ChatGPT-4 to help me make it pop. We had a little back and forth conversation on ways we could improve the resume, and when we were done it was pretty damn amazing. I'm pretty good at doing resumes, but me + ChatGPT was better than me alone.
Apparently it did a bangup job, because every interviewer went on and on about how good the resume was and how impressed they were.