Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You don't need a license to train AI, but that's not the point.

The license is applying to using your work - a EULA if you like. If you provide a license which states that it cannot be used for training, then users of your work are required to follow its licensing terms, and not following them is a violation of the licensing terms.

If you've made your work available to use, without an explicit clause specifying that it can't be used for training AI, then it is a valid use of your work that doesn't violate its licensing terms.

EULAs themselves are a grey area where their clauses are not necessarily enforcible, but it's ultimately for a court to decide, and it would definitely be better to have it as a clause in your license.

If the user ticks a box to declare they've read the terms and agree, it's hard for them to argue otherwise in court.

A sufficiently advanced LLM should be able to read the term and comprehend that it can't learn from the work and should ignore anything from it. As AI improves it will be more difficult for the providers to argue that term violations were accidental because their AIs would specifically need to be instructed to ignore such clauses - and the developers who provide such instructions would have a much weaker defence in a court.



AI companies don’t need to be users of the software to train on its source code.

And, again, if they don’t need a license then the license is irrelevant.

Lastly, if you place restrictions on the use of the software then it’s no longer open source. You certainly can’t do it with the GPL.


> Lastly, if you place restrictions on the use of the software then it’s no longer open source.

This particular point is the topic of the post, to be fair. A project with a copyleft license can still be "open source", even if it's not FOSS.

> AI companies don’t need to be users of the software to train on its source code.

I don't think this is true (or at least it's moot given the word "users"); they still need to legally obtain a copy of the source code, which is offered by the creator alongside a license. You can't just download a project from github and disregard the license that comes with it.


I don't really care about "open source" (per OSI definition). I'm specifically proposing a new kind of license which does place a restriction on use - that of using it to train AI - and I'm suggesting that such license apply not only to programs, but to the source code itself, with a prominent notice in each source file - and to non computer programs also - basically any creative works which carry a prominent notice that they're not to be used for AI training.

Disclaimer: IANAL

In any claim you would argue that you made it clear that your code should not be used for such purpose; that the AI developer is made aware of your intention for it to not be used for such purpose, that the AI developer was negligent, and so forth.

There are frameworks for dealing with cases where no contract exists, such as unjust enrichment[1], which (according to the citation), is examined as follows:

    Was the defendant enriched?
    Was the enrichment at the expense of the claimant?
    Was the enrichment unjust?
    Does the defendant have a defense?
    What remedies are available to the claimant?
The first point would basically be extremely difficult to argue against w.r.t the big AI providers. They're making billions on the backs of other people's creative works.

The third point it can be argued that what AI developers are doing is almost criminally unjust - they are performing mass copyright violations by training their AI on works for which they have no rights to copy, and performing no creative acts themselves, besides specific neural network designs which are essentially useless without training data. Moreover, if you've explicitly given notice that your copyrighted works are not intended to be used for AI training, then it would be easier to argue that the AI company's enrichment is unjust because they've intentionally ignored it.

The fourth point is precisely the reason you want a disclaimer stating that it should not be used for training. An AI developer could argue in their defense, that the work is publicly available and has no restrictions on use of training. However, that defense goes out of the window if you have made it clear that it should not be used for such purpose, and the developer (or AI itself) can be reasonably expected to be aware of this notice. They would need another defense besides "we didn't know".

So your claims in a court would largely come down to the second point: proving that their enrichment has come at your expense. This might be difficult for an individual, though could be proven if your code is something quite unique and that an AI is basically regurgitating it, and that this results in a loss for yourself. More likely, a claim against these AI companies would be a class action suit on behalf of many claimants where they could reasonably demonstrate that the work produced by AI is not original, is infringing on their own creative works, and that the defendant has no defense because they intentionally ignored the disclaimer that the creative works were not to be used for such purpose.

W.r.t the final point, there are two potential avenues for remedy if a claim were successful: One is that the claimants are financially compensated in proportion to damages - the other is that the AI developers are forced to stop using works which prominently display a "NO AI TRAINING" disclaimer, and any successful claim would set a precedent so that AI companies would necessarily need to be more considerate about the works they use for training.

If you went to court with such a claim, which is more likely to result in success: The case where you didn't put a notice to forbid using in AI training, or the case where you clearly put notices everywhere that it should not be used for AI training?

Ideally it shouldn't be necessary to have such a disclaimer, as using whole copyrighted works for training is not "fair use." However, the pressure is currently on courts to permit the use of copyrighted works for AI for several reasons.

But rather than waiting for courts to decide whether AI training is "fair use", why not be proactive and just start asserting that your creative works are not intended for AI slop? Even if courts rule that copyrighted works not carrying any disclaimer against AI training are free to use for training, this doesn't necessarily imply that all copyrighted works are free to use for training if they explicitly state otherwise.

[1]:https://en.wikipedia.org/wiki/Restitution_and_unjust_enrichm...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: