One AI detection tool is powerful enough to sink careers, and nobody can

An AI detection tool called Pangram has become the dominant force in identifying AI-generated writing, influencing decisions at publishers, universities, and scientific institutions. Matteo Wong reports for The Atlantic that Pangram has been used to flag a horror novel pulled before publication, articles in major newspapers, award-winning short stories, and portions of Pope Leo XIV’s encyclical on AI.

Pangram’s CEO Max Spero claims the tool incorrectly labels human text as AI-generated only once in every 10,000 cases. A University of Chicago study largely confirmed that figure for texts between 500 and 1,000 words. However, the tool’s ability to correctly identify AI text is weaker. Spero himself pointed to data suggesting it mislabels AI content as human roughly once in every 70 cases.

Humanizers undermine detection

Tools known as AI “humanizers” further erode Pangram’s reliability. Wong tested one called Walter Writes AI and found that it consistently caused Pangram to label AI-generated articles as human-written. Pangram’s training relies on pattern recognition rather than explicit rules, making its reasoning opaque even to its own developers.

The stakes of errors are high. Journalist Taylor Lorenz was publicly accused of using AI for a Vanity Fair article. Spero later confirmed Pangram had made a mistake. A New York City high school teacher told Wong he doubts some students’ papers are fully human-written, yet Pangram rates them as 100 percent human. Accusing a student without firm evidence, the teacher noted, carries serious consequences either way.

Spero says Pangram should serve as a starting point for investigation, not a final verdict. But as the tool connects to platforms like Canvas and reaches tens of millions of students, even a tiny error rate produces a large number of false accusations. Neuroscientist Tim Requarth, who teaches science writing at NYU, warns that AI detection will “wax and wane in its effectiveness for reasons we can’t predict.” Wong concludes that basing institutional rules on Pangram’s reliability is, as he writes, “like building a sandcastle at low tide.”

One AI detection tool is powerful enough to sink careers, and nobody can fully trust it

Humanizers undermine detection

Related posts:

Humanizers undermine detection

Stay up to date

Related posts: