OpenAI Ordered To Produce 20 Million User Conversations To NY Times

OpenAI Ordered To Produce 20 Million User Conversations To NY Times

OpenAI Ordered To Produce 20 Million User Conversations To NY Times

OpenAI has been ordered by a federal judge to turn over 20 million anonymized ChatGPT user logs to the NY Times and other newspapers suing the chat giant over its generative AI model. 

In a Nov. 7 order revealed today, New York Magistrate Judge Ona T. Wang said producing the logs in whole is appropriate – granting the plaintiffs’ motion to compel production. The newspapers had demanded the user logs to inspect how ChatGPT is used to create outputs they say infringe their copyrighted works. OpenAI pushed back, citing privacy concerns. 

Wang, however, did not find their argument compelling in explaining how consumers’ privacy rights were at risk given that there’s a protective order in place, and identifying information would be removed from the logs (so anyone who’s uploaded their tax return or a legal document is safe?).

OpenAI has until Nov. 14 to hand over the data – the latest twist in the hotly contested discovery process in the newspapers’ copyright lawsuits against OpenAI, Bloomberg Law reports. 

OpenAI had contested the wholesale production of the 20 million user logs and asked to narrow the sample, saying in a Oct. 30 briefing that the ask was inappropriate and would disclose private user conversations that had nothing to do with the copyright issue in the case.

Newspaper-plaintiffs including New York Times Co., however, pushed back and said without the user logs they couldn’t conduct expert analysis on topics such as how ChatGPT worked to pull news content for its users or how often the AI model hallucinated and generated false outputs attributed to the outlets. -BBG

The fight over user logs dates back to April – before lawsuits against OpenAI by various news outlets were consolidated for pretrial proceedings. In May, Wang issued a preservation order, rejecting OpenAI’s argument that the request was “sweeping” and “invasive.”

NYT’s lawsuit, filed in Dec. 2023, claims that the companies violated copyright laws by using Times’ content to train their AI models, including ChatGPT and Microsoft’s Copilot.

“Times journalism is the work of thousands of journalists, whose employment costs hundreds of millions of dollars per year,” reads the complaint. “Defendants have effectively avoided spending the billions of dollars that The Times invested in creating that work by taking it without permission or compensation.”

The lawsuit has potentially huge implications over ‘fair use’ of copyrighted materials, a complex legal doctrine governing factors such as the purpose of use, the nature of the copyrighted work, the amount and substantiality of the portion used, and the effect of the use on the potential market for the copyrighted work.

The legal landscape surrounding generative-AI is unsettled, with the technology still in its early days. There are other lawsuits that could test the rights of AI companies to “scrape” content from the web to train AI tools, including one by several prominent book authors against OpenAI. In February, Getty Images sued the AI art company Stability AI in Delaware, alleging that it had infringed on Getty’s copyrights. Stability AI at the time said it doesn’t comment on pending litigation. –WSJ

According to the NYT, AI tools developed by Microsoft and OpenAI have significantly increased their valuations due to the data ‘scraped’ for training.

Tyler Durden
Wed, 11/12/2025 – 10:10ZeroHedge News​Read More

Author: VolkAI
This is the imported news bot.