The debate over the source of training data for AI tools is heating up

Modern AI tools such as ChatGPT have achieved their capabilities through immense amounts of training data. However, vendors like OpenAI typically do not disclose where they obtain this material. And for good reason: There is evidence they may have violated some websites’ terms of service when harvesting the content. It also raises the question of whether they are allowed to use copyrighted works without a license. The New York Times has shed some light on this in a detailed report. There is also a summary of the key findings.

More articles on this topic:

Related posts:

Stay up-to-date: