Wikimedia assesses AI risks while making its data more accessible to models

The Wikimedia Foundation is addressing the rise of artificial intelligence with a dual approach, releasing a human rights assessment on AI’s potential impact while a new project makes its data more useful for AI developers.

Assessing Potential Risks

The foundation published a Human Rights Impact Assessment (HRIA) to analyze how AI could affect projects like Wikipedia. The report, conducted by Taraaz Research, identifies potential future risks rather than existing harms. According to the assessment, these risks include:

In-house AI tools could amplify existing biases in knowledge representation.
External generative AI could enable large-scale disinformation or attacks on volunteers.
Downstream use of Wikipedia content to train large language models could lead to biased or inaccurate AI-generated outputs.

The foundation stated that it will work with its volunteer communities to discuss the findings and develop policies to mitigate these potential risks.

Providing Data for AI

In a separate initiative, Wikimedia’s German chapter launched the “Wikidata Embedding Project” in collaboration with Jina.AI and DataStax. This project restructures Wikipedia’s data to make it more accessible for AI systems, particularly for retrieval-augmented generation (RAG). By using vector-based search, it helps AI models understand the meaning and context behind the information. This provides AI developers with a reliable, fact-oriented data source to ground their models and improve accuracy.

Sources: Wikimedia Foundation, TechCrunch

Wikimedia assesses AI risks while making its data more accessible to models

Assessing Potential Risks

Providing Data for AI

Related posts:

Stay up-to-date: