Cloudflare: Perplexity uses stealth crawlers to bypass website rules

The IT security company Cloudflare claims that the AI answer engine Perplexity is using undeclared web crawlers to access content on websites that explicitly forbid it. According to a blog post authored by several Cloudflare engineers, Perplexity ignores standard protocols designed to respect the preferences of website owners.

Cloudflare conducted an experiment to verify the behavior. The company set up new, private websites and used a robots.txt file, a standard instruction for web crawlers, to prohibit any automated access. Despite these measures and additional firewall blocks against Perplexity’s known crawlers, the AI engine was reportedly able to retrieve and summarize the restricted content in detail.

The investigation found that when Perplexity’s declared crawler was blocked, a “stealth” crawler took over. This crawler allegedly impersonated a standard web browser and used a rotating series of unlisted IP addresses to hide its identity and evade detection. Cloudflare states this activity was observed across millions of daily requests.

In contrast, the company notes that other AI companies, such as OpenAI, adhere to website directives. In a similar test, OpenAI’s crawler respected the robots.txt file and ceased activity when blocked. As a result of its findings, Cloudflare has removed Perplexity from its list of verified bots and implemented new rules to block the observed stealth crawling for its customers.

Stay up to date

Related posts: