Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Guessing. I think they may have trained on copyrighted stories. So what they supposedly did was ask an llm to replace the names of the main characters. Since Elara is not a frequently used name, there is little chance of clashing. Then they trained chatgpt on those stories.


Has any legal precedence been established that training breaks copyright? Are you implying that reprinting a novel word-for-word, except for changing one character's name, wouldn't be copyright infringement?


Does it not sound like a juicy news article? Its a bad image for openai


Highly unlikely. The more likely truth is just that the default temperature of ChatGPT is really low (but not quite zero); so it keeps spitting out (roughly) the same story. It does switch up protagonists a bit and different versions of ChatGPT also have different names for them. Slightly modified prompts also return different names (eg. "Tell me a sad story").


when asked the reason, ChatGPT had this to say- "Actually, the choice of “Elara” wasn’t a result of training on specific copyrighted stories or any prompt to avoid copyright claims. OpenAI models like me are designed to create original content without directly referencing copyrighted characters, and "Elara" is simply a popular-sounding name in many storytelling contexts. I just used it consistently for its versatility, but I’m totally open to switching things up!"


ChatGPT's opinion on the matter is completely worthless, unless it was also trained on an accurate description of its training process (it wasn't). Language models do not even have access to their own "thought process" - if you ask it "why" it said something, you will get a post-hoc rationalization 100 percent of the time because the next-word prediction only has access to the same text that you see. The rationalization might be incidentally correct, or it might not - either way it contributes no real information about the model's internal state.


There's an interesting theory that this is all that consciousness is: one part of the brain trying to explain the decisions of another part, a part into which it has no special insight.


That interesting theory is called the bicameral mind, and as far as I know it's widely considered pseudoscience and not taken seriously in any scientific field.

Also, it doesn't describe at all the way LLMs work, so it isn't even applicable.


So just like humans


No, not like humans. Humans have access to their own thought processes, and are capable of introspection.


Why do people write these kinds of "answers" that the model gives. It's not like the model knows why it's doing anything.


Not understanding how LLMs work.


Ah such condescension! I was careful not to provide an opinion, but replicate the answer as is. The intent was not to treat that answer as fact; but I thought that response was pretty revealing and in fact supported the parent comment on LLMs having been trained on copyrighted materials. The response chatgpt provided was that OpenAI models are designed to create original content "without directly referencing copyrighted characters". If it was creating original content it needn't have referred to the constraint with respect to avoiding directly referencing the copyrighted characters.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: