The debate over the source of training data for AI tools is heating up

Modern AI tools such as ChatGPT have achieved their capabilities through immense amounts of training data. However, vendors like OpenAI typically do not disclose where they obtain this material. And for good reason: There is evidence they may have violated some websites’ terms of service when harvesting the content. It also raises the question of whether they are allowed to use copyrighted works without a license. The New York Times has shed some light on this in a detailed report. There is also a summary of the key findings.

More articles on this topic:

Reuters: Inside Big Tech’s underground race to buy AI training data
Bloomberg: Adobe’s ‘Ethical’ Firefly AI Was Trained on Midjourney Images
VentureBeat: Apple’s $25-50 million Shutterstock deal highlights fierce competition for AI training data
The Hill: Schiff unveils AI training transparency measure
Bloomberg: Adobe Is Buying Videos for $3 Per Minute to Build AI Model

Stay up to date

Related posts: