Father, Hacker (Information Security Professional), Open Source Software Developer, Inventor, and 3D printing enthusiast

  • 0 Posts
  • 5 Comments
Joined 1 year ago
cake
Cake day: June 23rd, 2023

help-circle




  • Except there’s nothing illegal about scraping all the content from websites (including news sites) and putting it into your own personal database. That is–after all–how search engines work.

    It’s only illegal if you then distribute said copyrighted material without the copyright owner’s permission. Because that’s what copyright is all about: Distribution.

    The news sites distributing the content in this case freely gave it to OpenAI’s crawlers. It’s not like they broke into these organizations in order to copy their databases of news articles.

    For the news sites to have a case they need to demonstrate that OpenAI is creating a “derivative work” using their copyrighted material. However, that’s going to be a tough sell to judges and/or juries since the way LLMs work is not so different from how humans do: They take in information and then produce similar information (by predicting the next word/symbol, given a series of tokens/a prompt).

    If you read all of Stephen King’s books, for example, you might be better at writing horror stories. You may even start writing in a similar style! That doesn’t mean you’re violating his copyright by producing similar stories.