How to Block AI Crawlers from Your Website
Control how AI companies use your content. Block, allow, or choose selectively.
21% of top sites block GPTBot. 79% allow it. Here's how to make your choice.
Should You Block AI Crawlers?
Blocking is a choice, not inherently good or bad. Consider these trade-offs based on your goals.
๐ก๏ธIf you block AI crawlers...
- +Content won't train future AI models
- +More control over your data
- +Contributors may prefer this stance
- โMiss out on AI search traffic
- โWon't appear in ChatGPT answers
๐If you allow AI crawlers...
- +May get cited in ChatGPT answers
- +More visibility in AI search
- +Developers building AI tools can access
- โContent becomes training data
- โLess control once scraped
Most popular choice: Selective blocking: block training bots, allow search bots. Your content won't train AI models, but it can still appear in AI search results.
Copy-Paste robots.txt Rules
Add these rules to your robots.txt file to control AI crawlers.
Prevents your content from being used to train AI models. Most popular choice.
# Block AI Training Crawlers
# Add this to your robots.txt file
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: cohere-ai
Disallow: /
User-agent: Diffbot
Disallow: /
User-agent: ImagesiftBot
Disallow: /
User-agent: Omgilibot
Disallow: /
User-agent: FacebookBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: YouBot
Disallow: /Understanding AI Bot Types
Not all AI bots are created equal. Most sites want to block training but allow search.
๐ซTraining Bots
These collect data to train AI models. This is what most creators want to block.
- โข
GPTBot(OpenAI training) - โข
ClaudeBot(Anthropic training) - โข
Google-Extended(Gemini training) - โข
CCBot(Common Crawl datasets)
โSearch Bots
These answer questions in real-time. Blocking these removes you from AI search.
- โข
ChatGPT-User(ChatGPT browsing) - โข
PerplexityBot(Perplexity AI search) - โข
OAI-SearchBot(ChatGPT search) - โข
Applebot-Extended(Apple Intelligence)
Complete AI Bot Reference
Full database of AI crawlers with user-agent strings. Click any row for details.
AI Bot Reference
95 AI crawlers tracked
| Bot | User-Agent | Company | Purpose | Importance | Respects robots.txt |
|---|---|---|---|---|---|
GPTBot | OpenAI | Training | Tier 1 | Yes | |
OAI-SearchBot | OpenAI | Search | Tier 1 | Yes | |
ChatGPT-User | OpenAI | Assistant | Tier 1 | Yes | |
ClaudeBot | Anthropic | Training | Tier 1 | Yes | |
Claude-SearchBot | Anthropic | Search | Tier 1 | Yes | |
Claude-User | Claude-User | Anthropic | Assistant | Tier 1 | Yes |
Google-Extended | Training | Tier 1 | Yes | ||
Gemini Deep Research | Gemini-Deep-Research | Assistant | Tier 1 | ? | |
ChatGPT Agent | ChatGPT Agent | OpenAI | Agent | Tier 2 | Yes |
Operator | Operator | OpenAI | Agent | Tier 2 | ? |
GoogleOther | GoogleOther | Training | Tier 2 | Yes | |
Google-CloudVertexBot | Google-CloudVertexBot | Training | Tier 2 | Yes | |
Google NotebookLM | Google-NotebookLM | Assistant | Tier 2 | ? | |
Project Mariner | GoogleAgent-Mariner | Agent | Tier 2 | ? | |
Meta External Agent | meta-externalagent | Meta | Training | Tier 2 | Yes |
Step-by-Step Implementation
Find your robots.txt file
Your robots.txt file is at the root of your domain: https://yourdomain.com/robots.txt. If it doesn't exist, create one.
Add the blocking rules
Copy the rules from above and paste them into your robots.txt file. Choose between blocking all training bots, selective blocking, or allowing all.
Verify with StackFox
Use our robots.txt grader to verify your rules are working correctly and see which bots you're blocking.
Display your AI policy (optional)
Get a free badge to show visitors your AI policy stance. Builds trust with your community.
Show Your AI Policy with a Badge
Let visitors know your stance on AI. Free verification badge for your site.
๐ Preview Your Badge
Example badges from real sites:
Try different styles - badges update live!
Choose the Right Strategy for Your Site
โ๏ธSelective (Block Training, Allow Search)
The most common stance: Your content won't train AI models, but it will be visible in AI search results.
| Site Type | Recommended Message | Why |
|---|---|---|
| News sites | Training Blocked | Visible in AI search, won't train models |
| Developer docs | AI Search OK | Developers find you, data safe |
| Blogs | Visible in AI, Won't Train AI | Best of both worlds |
๐ก๏ธRestrictive (Block Training AND Search)
Full protection: No AI can scrape your content for any purpose.
| Site Type | Recommended Message | Why |
|---|---|---|
| Forums/communities | We Protect Our Community | Appeals to contributors |
| Creative writing | Your Content Stays Yours | Resonates with creators |
| Art platforms | Creator-Safe Platform | Artist-focused messaging |
โOpen (Allow Everything)
AI-friendly: You welcome AI tools and agents to access your content.
| Site Type | Recommended Message | Why |
|---|---|---|
| API documentation | AI-Ready | Developers love this |
| Open source projects | Agent-Friendly | Welcomes AI tools |
| Public data sites | Available for AI Training | Clear permission |
Frequently Asked Questions
Does blocking AI crawlers hurt my SEO?
Do AI crawlers actually respect robots.txt?
Which AI bots should I block?
Why would I want to ALLOW AI crawlers?
What's the difference between GPTBot and ChatGPT-User?
How do I know if my blocking is working?
What about llms.txt? Is that different from robots.txt?
Is this free?
Check Your Site's AI Policy
See which AI bots you're blocking or allowing. Get a free badge to display your stance.