Secret book scanning project reveals AI industry’s data hunger

Anthropic spent tens of millions of dollars to acquire and physically destroy millions of books for training its Claude AI chatbot. The company cut off book spines and scanned pages in a covert initiative called “Project Panama.”

Aaron Schaffer reports for The Washington Post, detailing the operation through more than 4,000 pages of unsealed court documents from a copyright lawsuit. Internal records show Anthropic wanted to keep the project quiet, stating “we don’t want it to be known that we are working on this.”

The company hired Tom Turvey, a Google veteran from the Google Books project, to lead the effort. Anthropic purchased books in batches of tens of thousands from retailers including Better World Books and World of Books. Documents describe using a hydraulic cutting machine to slice books before scanning them on high-speed scanners. The physical books were then recycled.

Court filings also reveal Anthropic co-founder Ben Mann downloaded pirated books from LibGen, a shadow library, in June 2021. Similar practices emerged at Meta and OpenAI, where employees expressed concerns about downloading pirated content.

A judge ruled that using books for AI training qualifies as fair use under copyright law because it transforms the material. However, Anthropic agreed to pay $1.5 billion to settle claims about how it initially acquired books through pirated sources. Authors affected by the downloads can claim approximately $3,000 per title.

About the author

Related posts:

Stay up-to-date:

Advertisement