Study suggests OpenAI trained GPT-4o on paywalled O'Reilly books

A new study by the AI Disclosures Project claims OpenAI likely trained its GPT-4o model on paywalled books from O’Reilly Media without a licensing agreement. Researchers Tim O’Reilly, Ilan Strauss, and Sruly Rosenblat analyzed how well different OpenAI models recognized content from O’Reilly books. As reported by Kyle Wiggers for TechCrunch, the team found that GPT-4o showed significantly stronger recognition of paywalled O’Reilly content compared to older models like GPT-3.5 Turbo. The researchers used a method called DE-COP to test whether the model could distinguish between human-authored texts and AI-generated versions. While not conclusive proof, the findings suggest OpenAI may be increasingly using non-public books to train its more sophisticated AI models. OpenAI did not respond to requests for comment on these allegations.

Study suggests OpenAI trained GPT-4o on paywalled O’Reilly books

Related posts:

Stay up to date

Related posts: