Modern AI tools such as ChatGPT have achieved their capabilities through immense amounts of training data. However, vendors like OpenAI typically do not disclose where they obtain this material. And for good reason: There is evidence they may have violated some websites’ terms of service when harvesting the content. It also raises the question of whether they are allowed to use copyrighted works without a license. The New York Times has shed some light on this in a detailed report. There is also a summary of the key findings.
More articles on this topic:
- Reuters: Inside Big Tech’s underground race to buy AI training data
- Bloomberg: Adobe’s ‘Ethical’ Firefly AI Was Trained on Midjourney Images
- VentureBeat: Apple’s $25-50 million Shutterstock deal highlights fierce competition for AI training data
- The Hill: Schiff unveils AI training transparency measure
- Bloomberg: Adobe Is Buying Videos for $3 Per Minute to Build AI Model