Maybe I’m reading this wrong, but commercial use of comments is prohibited by th...

araes · 2025-11-28T23:09:29 1764371369

From Legal | Y Combinator | Terms of Use | Conditions of Use [1]

[1] https://www.ycombinator.com/legal/#tou

  > Commercial Use: Unless otherwise expressly authorized herein or in the Site, you agree not to display, distribute, license, perform, publish, reproduce, duplicate, copy, create derivative works from, modify, sell, resell, exploit, transfer or upload for any commercial purposes, any portion of the Site, use of the Site, or access to the Site.

  > The buying, exchanging, selling and/or promotion (commercial or otherwise) of upvotes, comments, submissions, accounts (or any aspect of your account or any other account), karma, and/or content is strictly prohibited, constitutes a material breach of these Terms of Use, and could result in legal liability.

From [1] Terms of Use | Intellectual Property Rights:

  > Except as expressly authorized by Y Combinator, you agree not to modify, copy, frame, scrape, rent, lease, loan, sell, distribute or create derivative works based on the Site or the Site Content, in whole or in part, except that the foregoing does not apply to your own User Content (as defined below) that you legally upload to the Site.

  > In connection with your use of the Site you will not engage in or use any data mining, robots, scraping or similar data gathering or extraction methods.

larodi · 2025-11-29T11:34:47 1764416087

Surely plenty of YC companies scrap whatnot for derivatives and everyone’s fine with that…

larodi · 2025-11-30T23:14:38 1764544478

jokers, i love u so much, the fakers u are.

delichon · 2025-11-28T19:33:10 1764358390

Certainly it is literally derivative. But so are my memories of my time on the site. And in fact I do intend to make commercial use of some of those derivations. I believe it should be a right to make an external prosthesis for those memories in the form of a vector database.

isodev · 2025-11-28T21:09:23 1764364163

That’s not the same as using it to build models. You as an individual have the right to access this content as this is the purpose of this website. The content becoming the core of some model is not.

delichon · 2025-11-28T21:13:46 1764364426

If it's OK to encode it in your natural neural net, why is it not OK to put it in your artificial one?

BHSPitMonkey · 2025-11-28T21:21:02 1764364862

It's the same distinction as making a backup copy of a movie to your hard drive vs. redistributing it to other parties.

delichon · 2025-11-28T22:44:56 1764369896

You mean like free speech for concepts and ideas? It's OK to think them but not to tell other people about them? LLMs are another media of thought exchange, in some ways worse and others better. Of course it's out of bounds from them to produce literal copies of copyrighted work. But as with a human brain it should be OK for artificial neural nets to learn from them and generate new work.

ehnto · 2025-11-29T03:48:06 1764388086

Because the humans involved have decided they don't want that.

amelius · 2025-11-29T10:07:04 1764410824

This.

You can anthropomorphize all you want, but AI is not a human and the law will not see it as such.

godelski · 2025-11-28T21:53:23 1764366803

Let's talk after you've read all hacker news comments. Meet back here in a thousand years?

delichon · 2025-11-28T22:29:44 1764368984

I hired a company called OpenAI to do it for me. They're done, and brand new comments are also in its search, at least within a few minutes, try it. Is now good?

These modern brain prosthetics are darn good.

dylan604 · 2025-11-28T22:48:38 1764370118

But they are not doing it for free. It's not like if you are on a paid account that they remove the HN portion of the training data that is used.

For a forum of users that's supposed to be smarter than Reddit users, we sure do make our selves out to be just as unsmart as those Reddit users are purported. To not be able to understand the intent/meaning of "for commercial use" is just mind boggling to the point it has to be intentional. The purpose is what I'm still unclear though

anigbrowl · 2025-11-29T03:07:15 1764385635

Now you're just changing the argument. The mental copy of HN you have, besides being incomplete, is not copyable or resaleable.

godelski · 2025-11-29T07:39:43 1764401983

  > I hired a company called OpenAI to do it for me.

  >>> If it's OK to encode it in your natural neural net, why is it not OK to put it in your artificial one?

Well I guess that lines up. With that line of reasoning I have zero issue believing you outsourced your reading to them. You clearly aren't getting your money's worth.

amelius · 2025-11-29T10:27:05 1764412025

> I believe it should be a right to make an external prosthesis

Sure and some people would want a "gun prosthesis" as an aid to quickly throw small metallic objects, and it wouldn't be allowed either.

inkyoto · 2025-11-29T01:39:37 1764380377

> Certainly it is literally derivative.

I am not sure if it is that clear cut.

Embeddings are encodings of shared abstract concepts statistically inferred from many works or expressions of thoughts possessed by all humans.

With text embeddings, we get a many-to-one, lossy map: many possible texts ↝ one vector that preserves some structure about meaning and some structure about style, but not enough to reconstruct the original in general, and there is no principled way to say «this vector is derived specifically from that paragraph by authored by XYZ».

Does the encoded representation of the abstract concepts represent the derivate work? If yes, then every statement ever made by a human being represents the work derivative of someone else's by virtue of learning how to speak in the childhood – they create a derivative work of all prior speakers.

Technically, the3re is a strong argument against treating ordinary embedding vectors as derivative works, because:

- Embeddings are not uniquely reversible and, in general, it is not possible reconstruct the original text from the embedding;

- The embedding is one of an uncountable number of vectors in a space where nearby points correspond to many different possible sentences;

- Any individual vector is not meaningfully «the same» as the original work in the way that a translation or an adaptation is.

Please do note that this is the philosophical take and it glosses over the legally relevant differences between human and machine learning as the legal question ultimately depends on statutes, case law and policy choices that are still evolving.

Where it gets more complicated.

If the embeddings model has been trained on a large number of languages, it makes the cross-lingual search easily possible by using an abstract search concept in any language that the model has been trained on. The quality of such search results across languages X, Y and Z will be directly proportional to the scale and quality of the corpus of text that was used in the model training in the said languages.

Therefore, I can search for «the meaning of life»[0] in English and arrive at a highly relevant cluster of search results written in different languages by different people at different times, and the question becomes is «what exactly it has been statistically[1] derived from?».

[0] The cross-lingual search is what I did with my engineers last year to our surprise and delight of how well it actually worked.

[1] In the legal sense, if one can't trace a given vector uniquely back to a specific underlying copyrighted expression, and demonstrate substantial similarity of expression rather than idea, the «derivative work» argument in the legal sense becomes strained.

chasd00 · 2025-11-28T20:13:54 1764360834

Ha I was about to ask for all my comments to be removed as a joke. I guess I don’t have to.

dylan604 · 2025-11-28T22:50:25 1764370225

To think that any company anywhere actually removes all data upon request is a bit naive to me. Sure, maybe I'm too pessimistic, but there's just not enough evidence these deletes are not soft deletes. The data is just too valuable to them.

integralid · 2025-11-28T23:11:16 1764371476

Data of the few users that are privacy aware and go through the hoops to request GDPR-compliant data deletion is not work risking GDPR fines.

Data of non-european users who just click the "delete" button in their user profile? Completely different beast.

dylan604 · 2025-11-29T00:18:33 1764375513

But see, the requires two totally different workflows. It would just be easier to soft delete for everything and tell everyone that it's a hard delete.

I've never been convinced that my data will be deleted from any long term backups. There's nothing preventing them from periodically restoring data from a previous backup and not doing any kind of due diligence to ensure hard delete data is deleted again.

Who in the EU is actually going in and auditing hard deletes? If you log in and can no longer see the data because the soft delete flag prevents it from being displayed and/or if any "give me a report of data you have on me" reports empty because of soft delete flag, how does anyone prove their data was not soft deleted only?

franciscop · 2025-11-29T00:46:14 1764377174

What would a company that does that, hypothetically, then tell a user that requests their data held by the company reply? With their soft-deleted data, or would they say they have no data?

dylan604 · 2025-11-29T01:26:25 1764379585

They would obviously say we don't have the data. And to keep that person from "lying", the people that have the role to be able to make this request would have their software obey the soft delete flag and show them "no data available" or something like "on request of user, data deleted on YYYY-MM-DD HH:MM:SS" type of message. who would know any different?

sceeter · 2025-11-29T07:55:56 1764402956

They will be fine until someone hacks their systems and leak data. Once someone finds his deleted data in stolen data dump, it will be a mess.

dylan604 · 2025-11-29T16:48:06 1764434886

That’s fake news from a hacker. Just look at the data we have. The data they say we have, we don’t. They clearly made it up. It works in politics, so why not in tech?

adrianwaj · 2025-11-29T23:13:20 1764458000

I wonder if it'd be okay to do an AirDropped coin based on them? Does one exist? Maybe as a YC startup idea?

hammock · 2025-11-28T18:58:02 1764356282

Someone better go tell Open AI

isodev · 2025-11-28T18:58:56 1764356336

I think a number of lawsuits are in progress of teaching them that particular lesson.

lazide · 2025-11-28T19:04:39 1764356679

Still waiting for anything resembling a penalty, been a long time now. 5 years?

sfn42 · 2025-11-28T23:29:02 1764372542

I'm just wondering what gives HN, Reddit etc the right to our comments?

If anyone owns this comment it's me IMO. So I don't see any reason why HN should be able to sue anyone for using this freely available information.

handfuloflight · 2025-11-28T23:32:08 1764372728

With Reddit, at least it's the legal agreement you enter into them by creating an account and using it.

gunalx · 2025-11-28T23:47:35 1764373655

But that is not nessesarily enforceable in every region.

lazide · 2025-11-29T18:58:16 1764442696

Apparently, very little is enforceable anywhere, based on what the tech companies have been getting away with.

sfn42 · 2025-11-29T12:16:19 1764418579

So they own my comments because they said so.

gkbrk · 2025-12-10T08:37:31 1765355851

No, they own it because you said so. You "provide them with a global, non-revocable license to do things with the content you submit" as per the agreement you accepted. You're not required to enter into this agreement, this was a totally optional thing you opted to do.

lazide · 2025-11-29T14:15:10 1764425710

And they own the platform. And then you came to the platform (with those rules), and wrote your comment on it. So you agreed to the rules.

At least that is what the TOS usually says. You can always get around that by making your own service or the like.

Think of it like visiting a foreign country. Like it or not, their rules apply one way or another. If they can enforce them, anyway.

sfn42 · 2025-11-29T16:29:43 1764433783

Yeah I get it.

I just don't understand the public outrage. Why is everyone so worried about this? I write stuff knowing it's publicly available, and I don't give a crap about HN or Reddit or whomever's claims to my writings.

As far as I'm concerned it's all public domain, so what if OpenAI trains on it? Why should that bother me? I just don't understand, it really just feels like a witch hunt, like everyone just wants to hate AI companies and they'll jump on any bandwagon that's against them no matter how nonsensical it is.

lazide · 2025-11-29T17:22:23 1764436943

If you got replaced at the job you needed by ‘AI’, isn’t it salt in the wound that they used your comments that you wrote without it in mind (in part) to do it?

Why wouldn’t someone be mad about that?

verdverm · 2025-11-28T19:06:59 1764356819

Most of the time they are hardly penalties and look more like rounding errors to these companies

noitpmeder · 2025-11-28T19:09:10 1764356950

Not sure it's clear they will learn anything.... My impression was they were winning or settling these suits

isodev · 2025-11-28T19:13:51 1764357231

But is that a reason to keep doing it? Is the penalty the only reason people hold back on doing bad stuff?

pessimizer · 2025-11-28T19:59:41 1764359981

(Violation of HN Terms & Conditions || Violation of copyright) != "bad stuff"

(Violation of HN Terms & Conditions || Violation of copyright) = Potential penalty

dylan604 · 2025-11-28T22:43:47 1764369827

(Violation of HN Terms & Conditions || Violation of copyright) - Potential penalty = Unsane Profits

So the equation still balances for them to not give a damn

pseudosavant · 2025-11-28T21:50:12 1764366612

Isn’t that basically how societies work? Different penalties, but some kind of penalties enforcing the boundaries of that society?

nomdep · 2025-11-29T02:22:07 1764382927

Not everyone agree over some things being bad

fortyseven · 2025-11-28T19:27:38 1764358058

Does profit outweigh the penalty?