I feel like we're part of a dying generation or something. I keep seeing people who want to post content to the public internet, but they still want to own the data somehow, and control who see it, but still on the public internet.
I'm not sure how it's supposed to work, as I see the public internet as just that, a public square. What goes there can be picked up by anyone, for any purpose. If I want something to be secret, I don't put it on the public internet.
Gonna be interesting to see how that "public but in my control" movement continues to evolve, because it feels like they're climbing an impossible wall.
In the music biz, copyright is sometimes used to prevent very specific uses of material. Recent examples include a number of musicians (or perhaps their labels or publishing houses) denying the use of certain pieces of music at rallies for particular politician.
IANAL, but my read of this is that if the content has the appropriate licence, the licence holder can withhold certain rights & access from certain groups of potential licensees. I'm loosely aware that the common open source licences are highly permissive, so probably they can't be used in this way... but presumably not all licences are like that. So, even though the work is "public", it should still be possible to enforce subsets of rights.
And to take your "public square" analogy... before we had cameras everywhere, there was some expectation of "casual" privacy even in public spaces. Not everyone in the square hears everything said by everyone else in the square. The fact that digital tools make privacy breaches much easier doesn't mean it should be tolerated.
(that said, I'm fairly careful what I publish online)
You have to actually utilize the legal system. Other suggest attaching licenses to they published content/data preventing AI training, but attaching the license alone does nothing. You have to actually drag someone to court once in a while for it to work.
It's the same complaint about the GDPR, if it works why are site still doing X/Y/Z... Well because all people do is complain online, you need to report violations and be prepared to take legal action.
> legal complaints only work against people in your country and those it works with.
Well yes. Process Service tends to be quite different from country to country, and if you don't know how it works there, you probably won't be able to make vague "legal complaints" and have them taken seriously.
If you really want to make a legal complaint in Russia, I would suggest you look into what is called a Process Service Specialist who has specific experience with Arbitrazh.
I get what your saying, and I do question the value of suing someone in Russia, or China, but did you actually get a lawyer file an actual real lawsuit in Russia? Again, Russia, probably no really going to work.
There's absolutely no reason why you couldn't drag OpenAI to court... you'd need a ton of money, but you could and if you win, then the rest of the AI companies are going to get very busy adjusting their behaviour.
Though that also goes towards those posting content. Systems that generate an infinite number of permutations for viewing the same information are a poor design. It can easily lead to even conscientious people discovering that a simple attempt to slowly mirror a website overnight with wget has resulted in some rabbit hole of forum view parameter explosion.
I'm one of those people. I think it comes down to the intent. There is an implicit good will of those that do this that the data isn't abused or the infrastructure behind it overwhelmed (self-hosting). "Big Tech" just make this worse, because their motivations aren't the same as ours (small web).
The public internet is a public square, but does it have to be some caricature 1970's Times Square dominated by dealers, pimps, and thugs all looking to extract whatever they can from me?
Once all devices are locked down and all control has been taken away from the users, we can finally have functional DRM on every device and make this dream come true.
> I feel like we're part of a dying generation or something
Well... yes. Just like every living thing that's ever existed ;)
But even copyright doesn't give you "control" over something. I can still download it and use it however I want, privately, just that I'm limited legally to further distribute it. Unless I change it sufficiently, then it's again OK to distribute my changed one. Of course, depending on country and whatnot.
Problem remains the same, as soon as the content is visible on the public internet, you lose 100% control of it's propagation, illegal or not.
> I can still download it and use it however I want, privately, just that I'm limited legally to further distribute it
the party that allowed you to download it committed copyright infringement by distributing it. the same restrictions apply to them as to you*. a copyright holder may well want that party to stop distributing it, which means no-one else can download it.
it’s just moving the ‘control’ up a level to a different party. it’s still there.
> as soon as the content is visible on the public internet, you lose 100% control of it's propagation, illegal or not.
GPL license: thou shall always* provide access to this software’s source code —> form of control in the opposite direction to music copyright, but still a form of control
* subject to terms, conditions, locale and other legal things (IANAL)
> the party that allowed you to download it committed copyright infringement by distributing it. the same restrictions apply to them as to you*. a copyright holder may well want that party to stop distributing it, which means no-one else can download it.
That's why every social media site in existence puts terms in its EULA demanding that users grant the site a blanket license to redistribute their content, over and above any separate licenses they may put on it. After it's been redistributed to third parties, the copyright holder has no more control (at least, not via copyright law) over how those copies are privately used.
E.g., on HN: "With respect to the content or other materials you upload through the Site or share with other users or recipients (collectively, 'User Content'), you represent and warrant that you own all right, title and interest in and to such User Content, including, without limitation, all copyrights and rights of publicity contained therein. By uploading any User Content you hereby grant and will grant Y Combinator and its affiliated companies a nonexclusive, worldwide, royalty free, fully paid up, transferable, sublicensable, perpetual, irrevocable license to copy, display, upload, perform, distribute, store, modify and otherwise use your User Content for any Y Combinator-related purpose in any form, medium or technology now known or later developed."
If you post something on a physical bulletin board, you expect people will come by and read it
If a bunch of "scrapers" come by and form a massive mob so that nobody else can read it, and then cover the board with slightly different copies attributed to themselves, that isn't exactly the "public square" you imagined
> If you post something on a physical bulletin board, you expect people will come by and read it
Or birds, or public cameras, or people who take photos next to it, or people archiving bulletin boards, or...
If I put stuff in public spaces, I expect anyone and anything to be able to read, access and store it. Basically how I treat this very comment, and everything else I put on the public internet.
Eh, the internet is a weirdly semi-public space. Think of it more like a mall parking lot than a public common. If you put something up for notice there it will probably be fine. For example a number of grocery stores around me have boards for things just like that. But the moment it becomes a public nuisance for them they will trespass your ass outta there faster than a starved dog would eat a dropped hotdog.
As you say, it's not public as in public road (and even that has police to enforce proper behavior), but more like publicly accessible but on private properties.
When it's my server and I'm paying for it, then banning resources wasters is the right move.
You can’t have your cake and eat it, too. The power of the internet and mass media is that you can publish to the whole world, and make something public to billions of people. With that power comes side effects, such as, obviously, billions of people being able to privately do whatever they want with the information you, you know, published.
“public, but, no, not like that” isn’t a thing and no technological measure can make it a thing.
I don't know, this is like saying that because those bowls of mints at a restaurant are "free", I can back a trailer up to the door and start loading it up. Even if you know they'll never run out of mints.
I feel like you need to present a very strong case where LLMs are "the public" before you take such a weak position when interpreting the entirety of the article.
Drew makes it perfectly clear in TFA that "the public", as he sees it, is fully entitled and should make use of the data SourceHut provides.
If the AI scrapers respected the robots.txt file then this wouldn't be an issue. A company is allowed to set the terms of service for their service and take action if other companies are abusing that.
The point of a web host is to serve the users’ data to the public.
Anything else means the web host is broken.