Wikipedia guide identifies telltale signs of AI-generated content

Wikipedia volunteers have compiled an extensive field guide to detect AI-generated text on the online encyclopedia. The document lists dozens of writing patterns, formatting quirks and technical markers that signal undisclosed use of chatbots like ChatGPT.

WikiProject AI Cleanup publishes the guide as an advice page for editors. The list draws on observations from thousands of AI-generated articles and drafts submitted to Wikipedia since late 2022.

According to the guide, large language models tend to produce overly promotional language. AI-generated articles frequently describe ordinary topics as having “lasting impact” or “profound heritage.” A beetle species becomes significant for “contributing to ecological balance,” while a small town supposedly offers “stunning natural beauty” and a “rich tapestry” of culture.

These texts also often include superficial analysis attached to sentences with present participles. Facts and events are described as “highlighting” or “underscoring” significance, even though such actions require a conscious agent. Articles may end with formulaic sections about challenges and future prospects, typically starting with phrases like “Despite its success, the subject faces challenges.”

Furthermore, AI chatbots struggle with Wikipedia’s markup language. The guide notes that models default to Markdown formatting instead of wikitext, producing broken code when editors paste responses directly. Hash symbols appear instead of equal signs for headings. Curly quotation marks replace straight ones.

ChatGPT also leaves distinctive technical traces. The guide identifies placeholder code like “citeturn0search0” that appears when editors copy text from the chatbot interface. URLs may contain “utm_source=chatgpt.com” tracking parameters. References sometimes use invalid DOIs or ISBNs, pointing to non-existent sources.

The document warns against relying solely on AI detection tools, which have significant error rates. Many listed indicators also appear in human writing, since language models train on text written by people. No single sign proves AI use.

Editors should look for combinations of signs rather than isolated occurrences. Even when surface problems get fixed, deeper issues remain. AI-generated content often lacks the specific details and unusual facts that make articles useful, replacing them with generic statements that could describe many topics.

Three specific markers allow immediate deletion under Wikipedia policy: prompts intended for the user like “I hope this helps,” knowledge cutoff disclaimers stating information is current only until a certain date, and completely fabricated citations.

The guide acknowledges that some patterns reflect broader internet writing trends. For example, Microsoft Word and mobile devices add curly quotes automatically. Letter-like formatting with greetings and closings does not prove AI use by itself. Perfect grammar alone means nothing, as many Wikipedia editors are skilled writers.

ChatGPT launched to the public on 30 November 2022. The guide notes that text added to Wikipedia before that date cannot be AI-generated, despite occasional coincidental matches with listed patterns.

Related posts:

Stay up-to-date:

Advertisement