AI crawlers overwhelm open source infrastructure

Free and open source software (FOSS) projects are facing severe infrastructure challenges due to aggressive crawling by AI companies. According to a report by Niccolò Venerandi, multiple FOSS projects have experienced outages and service disruptions from AI crawlers that ignore standard protocols like robots.txt.

SourceHut founder Drew DeVault reported that LLM crawlers are accessing expensive endpoints using random user agents from thousands of IP addresses, making them difficult to block. KDE’s GitLab infrastructure was recently overwhelmed by crawlers from Alibaba IP ranges, while GNOME has implemented Anubis, a proof-of-work challenger to block AI scrapers after experiencing similar issues since November.

The problem extends beyond these projects. LWN, Fedora, Inkscape, Diaspora, and Read the Docs have all reported significant disruptions. Read the Docs noted that blocking AI crawlers reduced their traffic by 75%, saving approximately $1,500 monthly.

Besides infrastructure strain, FOSS projects are also dealing with AI-generated bug reports. Daniel Stenberg of the Curl project described receiving credible-looking but hallucinated security vulnerability reports that waste developer time to investigate.

These issues disproportionately affect FOSS projects, which typically have fewer resources than commercial products and maintain more publicly accessible infrastructure. Various mitigation strategies are being attempted, from country-wide IP blocks to shared blocklists of known AI crawler addresses, but a comprehensive solution remains elusive as crawlers continue to change their techniques to avoid detection.

AI crawlers overwhelm open source infrastructure

Related posts:

Stay up-to-date: