StackFox Research · January 2026 · Updated February 2026

Who's Blocking AI?

We analyzed 10,000 top websites. Here's what we found.

February 2026 Update

We re-ran our analysis on 7,636 top websites from the Tranco list. Blocking is up across the board. OpenAI now gets blocked more than Anthropic, reversing the January trend.

January 2026

15%

block GPTBot

13%

block ClaudeBot

23%

have AI rules

10%

have llms.txt

February 2026

17%

block GPTBot (+2)

15%

block ClaudeBot (+2)

28%

have AI rules (+5)

have llms.txt (-4)

A year ago, almost nobody blocked AI crawlers. Now nearly one in four top websites do. That's the headline. But the real story is in who's blocking, and who isn't.

Two webs are emerging

The data shows a clear split. Media companies and publishers block AI training at high rates. SaaS companies and developer tools mostly don't.

Media & Publishing

Block AI training67%

Have llms.txt3%

Building walls

Developer Tools & SaaS

Block AI training8%

Have llms.txt34%

Building bridges

Media companies' content is what makes AI useful. Block it, and you protect your moat. Developer tools want to be recommended by AI. Embrace it, and you gain distribution.

The heaviest blockers

Some sites block nearly every AI crawler we track. News organizations and media companies lead the charge. The numbers have grown since January as sites add more bots to their blocklists.

theconversation.com63 bots blocked →bild.de55 bots blocked →zdnet.com48 bots blocked →usatoday.com47 bots blocked →pcmag.com47 bots blocked →lifehacker.com47 bots blocked →cnn.com37 bots blocked →amazon.com33 bots blocked →nytimes.com27 bots blocked →wired.com26 bots blocked →

OpenAI now gets blocked more than Anthropic

In January, Anthropic led in blocks because it operates more crawler names (ClaudeBot, anthropic-ai, Claude-Web, Claude-User). More names, more chances to end up on a blocklist.

By February, OpenAI overtook them. As more sites adopt comprehensive AI blocklists, the operator with the biggest name recognition gets targeted first. 62% of sites with AI rules now block at least one OpenAI bot, up from 30% in January.

% of sites with AI rules blocking each company (February 2026)

OpenAI

62%

Anthropic

57%

Google

51%

ByteDance

48%

Google's AI crawlers

Google operates several AI-specific crawlers separate from regular Googlebot:

Google-Extended - Controls Gemini/Vertex AI training. Blocking this does NOT affect Google Search.

GoogleOther - Internal R&D content fetching.

Gemini-Deep-Research - Powers Gemini's research assistant feature.

Google's crawler documentation →

51% of sites with AI rules block at least one Google AI crawler. Most target Google-Extended specifically, letting regular Googlebot through for search indexing.

Training bots vs search bots

Both OpenAI and Anthropic now run multiple crawlers. Training bots (GPTBot, ClaudeBot) collect data to improve models. Search bots (OAI-SearchBot) power real-time features like ChatGPT browsing.

24%

block training bots

was 20% in Jan

20%

block search bots

was 12% in Jan

Sites want to appear in AI search results. That's traffic. They don't want their content used to train the next model version. But the gap is narrowing. Search bot blocking jumped from 12% to 20% in just one month.

The irony: AI companies blocking AI

Even AI companies block each other's crawlers. ChatGPT.com still leads with 10 AI bots blocked. Most other AI company sites either don't have a robots.txt at all or don't block other AI crawlers.

llms.txt adoption is still low

While 28% of sites have AI rules in their robots.txt, only 6% have llms.txt, the new standard for communicating with AI instead of blocking it. That's down from 10% in January, likely reflecting our larger sample rather than a real decline.

The adopters are still mostly tech companies: Adobe, Shopify, Stripe, Wordpress, Dropbox, PayPal, Nvidia. They see AI as a distribution channel. When ChatGPT recommends "use Stripe for payments," that's valuable.

adobe.com →shopify.com →stripe.com →wordpress.com →dropbox.com →paypal.com →nvidia.com →opera.com →

What happens next

Robots.txt was designed in 1994 for a simpler web. Now it's handling questions its creators never imagined. What's the difference between training and inference? Between a search engine and a chatbot?

New standards will emerge. Licensing deals will multiply. Media companies blocking today may become invisible to users who discover content through AI. Developer tools embracing AI may get recommended by every assistant.

Methodology

January 2026: We analyzed 10,000 top websites from the Tranco list, fetching robots.txt and llms.txt from each. February 2026: We re-ran the analysis on 7,636 top sites (TOP_100, TOP_1K, TOP_10K tiers), of which 1,912 have robots.txt. Percentages are of sites with robots.txt. We parsed for 95+ known AI crawler user-agents across OpenAI, Anthropic, Google, Meta, Perplexity, ByteDance, Apple, and others. Bots were categorized by purpose: training, search, and assistant. Company-level blocking is calculated as the percentage of sites with any AI rules that block at least one bot from that company.

Two webs are emerging

The heaviest blockers

OpenAI now gets blocked more than Anthropic

Google's AI crawlers

Training bots vs search bots

The irony: AI companies blocking AI

llms.txt adoption is still low

What happens next

Check any site's AI policy