How to Block AI Crawlers from Your Website

Control how AI companies use your content. Block, allow, or choose selectively.

21% of top sites block GPTBot. 79% allow it. Here's how to make your choice.

Should You Block AI Crawlers?

Blocking is a choice, not inherently good or bad. Consider these trade-offs based on your goals.

🛡️If you block AI crawlers...

+Content won't train future AI models
+More control over your data
+Contributors may prefer this stance
−Miss out on AI search traffic
−Won't appear in ChatGPT answers

🌐If you allow AI crawlers...

+May get cited in ChatGPT answers
+More visibility in AI search
+Developers building AI tools can access
−Content becomes training data
−Less control once scraped

Most popular choice: Selective blocking: block training bots, allow search bots. Your content won't train AI models, but it can still appear in AI search results.

Copy-Paste robots.txt Rules

Add these rules to your robots.txt file to control AI crawlers.

Prevents your content from being used to train AI models. Most popular choice.

robots.txt

# Block AI Training Crawlers
# Add this to your robots.txt file

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: ImagesiftBot
Disallow: /

User-agent: Omgilibot
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: YouBot
Disallow: /

Understanding AI Bot Types

Not all AI bots are created equal. Most sites want to block training but allow search.

🚫Training Bots

These collect data to train AI models. This is what most creators want to block.

• GPTBot (OpenAI training)
• ClaudeBot (Anthropic training)
• Google-Extended (Gemini training)
• CCBot (Common Crawl datasets)

✓Search Bots

These answer questions in real-time. Blocking these removes you from AI search.

• ChatGPT-User (ChatGPT browsing)
• PerplexityBot (Perplexity AI search)
• OAI-SearchBot (ChatGPT search)
• Applebot-Extended (Apple Intelligence)

Complete AI Bot Reference

Full database of AI crawlers with user-agent strings. Click any row for details.

AI Bot Reference

95 AI crawlers tracked

Bot	User-Agent	Company	Purpose	Importance	Respects robots.txt
GPTBot	`GPTBot`	OpenAI	Training	Tier 1	Yes
OAI-SearchBot	`OAI-SearchBot`	OpenAI	Search	Tier 1	Yes
ChatGPT-User	`ChatGPT-User`	OpenAI	Assistant	Tier 1	Yes
ClaudeBot	`ClaudeBot`	Anthropic	Training	Tier 1	Yes
Claude-SearchBot	`Claude-SearchBot`	Anthropic	Search	Tier 1	Yes
Claude-User	`Claude-User`	Anthropic	Assistant	Tier 1	Yes
Google-Extended	`Google-Extended`	Google	Training	Tier 1	Yes
Gemini Deep Research	`Gemini-Deep-Research`	Google	Assistant	Tier 1	?
ChatGPT Agent	`ChatGPT Agent`	OpenAI	Agent	Tier 2	Yes
Operator	`Operator`	OpenAI	Agent	Tier 2	?
GoogleOther	`GoogleOther`	Google	Training	Tier 2	Yes
Google-CloudVertexBot	`Google-CloudVertexBot`	Google	Training	Tier 2	Yes
Google NotebookLM	`Google-NotebookLM`	Google	Assistant	Tier 2	?
Project Mariner	`GoogleAgent-Mariner`	Google	Agent	Tier 2	?
Meta External Agent	`meta-externalagent`	Meta	Training	Tier 2	Yes

Step-by-Step Implementation

Find your robots.txt file

Your robots.txt file is at the root of your domain: https://yourdomain.com/robots.txt. If it doesn't exist, create one.

Add the blocking rules

Copy the rules from above and paste them into your robots.txt file. Choose between blocking all training bots, selective blocking, or allowing all.

Verify with StackFox

Use our robots.txt grader to verify your rules are working correctly and see which bots you're blocking.

Display your AI policy (optional)

Get a free badge to show visitors your AI policy stance. Builds trust with your community.

Show Your AI Policy with a Badge

Let visitors know your stance on AI. Free verification badge for your site.

📛 Preview Your Badge

Example badges from real sites:

NY Times

GitHub

Enter your domain to get your badge

Badge Style

Message Style

Try different styles - badges update live!

Choose the Right Strategy for Your Site

⚖️Selective (Block Training, Allow Search)

The most common stance: Your content won't train AI models, but it will be visible in AI search results.

Site Type	Recommended Message	Why
News sites	`Training Blocked`	Visible in AI search, won't train models
Developer docs	`AI Search OK`	Developers find you, data safe
Blogs	`Visible in AI, Won't Train AI`	Best of both worlds

🛡️Restrictive (Block Training AND Search)

Full protection: No AI can scrape your content for any purpose.

Site Type	Recommended Message	Why
Forums/communities	`We Protect Our Community`	Appeals to contributors
Creative writing	`Your Content Stays Yours`	Resonates with creators
Art platforms	`Creator-Safe Platform`	Artist-focused messaging

✓Open (Allow Everything)

AI-friendly: You welcome AI tools and agents to access your content.

Site Type	Recommended Message	Why
API documentation	`AI-Ready`	Developers love this
Open source projects	`Agent-Friendly`	Welcomes AI tools
Public data sites	`Available for AI Training`	Clear permission

Frequently Asked Questions

Does blocking AI crawlers hurt my SEO?

No, blocking AI crawlers has no effect on your Google rankings. Googlebot (for search) and Google-Extended (for Gemini training) are separate bots. You can block Google-Extended while still being indexed by Googlebot. The same applies to Bing and other search engines.

Do AI crawlers actually respect robots.txt?

Major AI companies (OpenAI, Anthropic, Google, Microsoft) respect robots.txt for their official crawlers. However, some smaller or less scrupulous crawlers may not. We flag which bots are known to respect or ignore robots.txt in our database. For complete protection against all crawlers, you'd need server-level blocking.

Which AI bots should I block?

It depends on your goals. If you want to prevent training but stay visible in AI search, block training bots (GPTBot, ClaudeBot, Google-Extended) but allow search bots (ChatGPT-User, PerplexityBot). If you want complete protection, block all AI bots. If you want maximum AI visibility, allow everything. Use our robots.txt generator to create the right rules for your situation.

Why would I want to ALLOW AI crawlers?

Allowing AI crawlers means your content can appear in AI search results (like ChatGPT's browse mode or Perplexity), potentially driving traffic to your site. API documentation, open source projects, and public information sites often benefit from being AI-accessible. It's a trade-off between control and visibility.

What's the difference between GPTBot and ChatGPT-User?

GPTBot collects data to train future GPT models. Your content becomes part of the model's weights permanently. ChatGPT-User browses the web in real-time when users ask ChatGPT to search for current information. Blocking GPTBot prevents training; blocking ChatGPT-User prevents your site from appearing in ChatGPT's live search results.

How do I know if my blocking is working?

Use our robots.txt grader to analyze your current configuration. It shows exactly which AI bots you're blocking and which you're allowing. You can also enter your domain on our homepage to see your full AI policy analysis.

What about llms.txt? Is that different from robots.txt?

Yes, llms.txt is a newer standard that lets you provide structured information specifically for AI systems. While robots.txt blocks or allows crawlers, llms.txt can provide context, usage guidelines, and licensing information. Learn more at llmstxt.org.

Is this free?

Yes, all our tools are free: the robots.txt grader, robots.txt generator, and the verification badge. We analyze your site and generate badges at no cost.

Check Your Site's AI Policy

See which AI bots you're blocking or allowing. Get a free badge to display your stance.

Analyze Your Site Generate robots.txt

Grade my robots.txt Generate robots.txt