StackFox Research ยท January 2026 ยท Updated February 2026
Who's Blocking AI?
We analyzed 10,000 top websites. Here's what we found.
View as SlidesWe re-ran our analysis on 7,636 top websites from the Tranco list. Blocking is up across the board. OpenAI now gets blocked more than Anthropic, reversing the January trend.
A year ago, almost nobody blocked AI crawlers. Now nearly one in four top websites do. That's the headline. But the real story is in who's blocking, and who isn't.
Two webs are emerging
The data shows a clear split. Media companies and publishers block AI training at high rates. SaaS companies and developer tools mostly don't.
Media companies' content is what makes AI useful. Block it, and you protect your moat. Developer tools want to be recommended by AI. Embrace it, and you gain distribution.
The heaviest blockers
Some sites block nearly every AI crawler we track. News organizations and media companies lead the charge. The numbers have grown since January as sites add more bots to their blocklists.
OpenAI now gets blocked more than Anthropic
In January, Anthropic led in blocks because it operates more crawler names (ClaudeBot, anthropic-ai, Claude-Web, Claude-User). More names, more chances to end up on a blocklist.
By February, OpenAI overtook them. As more sites adopt comprehensive AI blocklists, the operator with the biggest name recognition gets targeted first. 62% of sites with AI rules now block at least one OpenAI bot, up from 30% in January.
Google's AI crawlers
Google operates several AI-specific crawlers separate from regular Googlebot:
51% of sites with AI rules block at least one Google AI crawler. Most target Google-Extended specifically, letting regular Googlebot through for search indexing.
Training bots vs search bots
Both OpenAI and Anthropic now run multiple crawlers. Training bots (GPTBot, ClaudeBot) collect data to improve models. Search bots (OAI-SearchBot) power real-time features like ChatGPT browsing.
Sites want to appear in AI search results. That's traffic. They don't want their content used to train the next model version. But the gap is narrowing. Search bot blocking jumped from 12% to 20% in just one month.
The irony: AI companies blocking AI
Even AI companies block each other's crawlers. ChatGPT.com still leads with 10 AI bots blocked. Most other AI company sites either don't have a robots.txt at all or don't block other AI crawlers.
llms.txt adoption is still low
While 28% of sites have AI rules in their robots.txt, only 6% have llms.txt, the new standard for communicating with AI instead of blocking it. That's down from 10% in January, likely reflecting our larger sample rather than a real decline.
The adopters are still mostly tech companies: Adobe, Shopify, Stripe, Wordpress, Dropbox, PayPal, Nvidia. They see AI as a distribution channel. When ChatGPT recommends "use Stripe for payments," that's valuable.
What happens next
Robots.txt was designed in 1994 for a simpler web. Now it's handling questions its creators never imagined. What's the difference between training and inference? Between a search engine and a chatbot?
New standards will emerge. Licensing deals will multiply. Media companies blocking today may become invisible to users who discover content through AI. Developer tools embracing AI may get recommended by every assistant.
Methodology
January 2026: We analyzed 10,000 top websites from the Tranco list, fetching robots.txt and llms.txt from each. February 2026: We re-ran the analysis on 7,636 top sites (TOP_100, TOP_1K, TOP_10K tiers), of which 1,912 have robots.txt. Percentages are of sites with robots.txt. We parsed for 95+ known AI crawler user-agents across OpenAI, Anthropic, Google, Meta, Perplexity, ByteDance, Apple, and others. Bots were categorized by purpose: training, search, and assistant. Company-level blocking is calculated as the percentage of sites with any AI rules that block at least one bot from that company.