๐ŸฆŠStackFox

How to Block AI Crawlers from Your Website

Control how AI companies use your content. Block, allow, or choose selectively.

21% of top sites block GPTBot. 79% allow it. Here's how to make your choice.

Should You Block AI Crawlers?

Blocking is a choice, not inherently good or bad. Consider these trade-offs based on your goals.

๐Ÿ›ก๏ธIf you block AI crawlers...

  • +Content won't train future AI models
  • +More control over your data
  • +Contributors may prefer this stance
  • โˆ’Miss out on AI search traffic
  • โˆ’Won't appear in ChatGPT answers

๐ŸŒIf you allow AI crawlers...

  • +May get cited in ChatGPT answers
  • +More visibility in AI search
  • +Developers building AI tools can access
  • โˆ’Content becomes training data
  • โˆ’Less control once scraped

Most popular choice: Selective blocking: block training bots, allow search bots. Your content won't train AI models, but it can still appear in AI search results.

Copy-Paste robots.txt Rules

Add these rules to your robots.txt file to control AI crawlers.

Prevents your content from being used to train AI models. Most popular choice.

robots.txt
# Block AI Training Crawlers
# Add this to your robots.txt file

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: ImagesiftBot
Disallow: /

User-agent: Omgilibot
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: YouBot
Disallow: /

Understanding AI Bot Types

Not all AI bots are created equal. Most sites want to block training but allow search.

๐ŸšซTraining Bots

These collect data to train AI models. This is what most creators want to block.

  • โ€ข GPTBot (OpenAI training)
  • โ€ข ClaudeBot (Anthropic training)
  • โ€ข Google-Extended (Gemini training)
  • โ€ข CCBot (Common Crawl datasets)

โœ“Search Bots

These answer questions in real-time. Blocking these removes you from AI search.

  • โ€ข ChatGPT-User (ChatGPT browsing)
  • โ€ข PerplexityBot (Perplexity AI search)
  • โ€ข OAI-SearchBot (ChatGPT search)
  • โ€ข Applebot-Extended (Apple Intelligence)

Complete AI Bot Reference

Full database of AI crawlers with user-agent strings. Click any row for details.

AI Bot Reference

95 AI crawlers tracked

BotUser-AgentCompanyPurposeImportanceRespects robots.txt
GPTBot
GPTBotOpenAITrainingTier 1Yes
OAI-SearchBot
OAI-SearchBotOpenAISearchTier 1Yes
ChatGPT-User
ChatGPT-UserOpenAIAssistantTier 1Yes
ClaudeBot
ClaudeBotAnthropicTrainingTier 1Yes
Claude-SearchBot
Claude-SearchBotAnthropicSearchTier 1Yes
Claude-User
Claude-UserAnthropicAssistantTier 1Yes
Google-Extended
Google-ExtendedGoogleTrainingTier 1Yes
Gemini Deep Research
Gemini-Deep-ResearchGoogleAssistantTier 1?
ChatGPT Agent
ChatGPT AgentOpenAIAgentTier 2Yes
Operator
OperatorOpenAIAgentTier 2?
GoogleOther
GoogleOtherGoogleTrainingTier 2Yes
Google-CloudVertexBot
Google-CloudVertexBotGoogleTrainingTier 2Yes
Google NotebookLM
Google-NotebookLMGoogleAssistantTier 2?
Project Mariner
GoogleAgent-MarinerGoogleAgentTier 2?
Meta External Agent
meta-externalagentMetaTrainingTier 2Yes

Step-by-Step Implementation

1

Find your robots.txt file

Your robots.txt file is at the root of your domain: https://yourdomain.com/robots.txt. If it doesn't exist, create one.

2

Add the blocking rules

Copy the rules from above and paste them into your robots.txt file. Choose between blocking all training bots, selective blocking, or allowing all.

3

Verify with StackFox

Use our robots.txt grader to verify your rules are working correctly and see which bots you're blocking.

4

Display your AI policy (optional)

Get a free badge to show visitors your AI policy stance. Builds trust with your community.

Show Your AI Policy with a Badge

Let visitors know your stance on AI. Free verification badge for your site.

๐Ÿ“› Preview Your Badge

Example badges from real sites:

Reddit AI Policy BadgeReddit
NY Times AI Policy BadgeNY Times
GitHub AI Policy BadgeGitHub
Badge Style
Message Style

Try different styles - badges update live!

Choose the Right Strategy for Your Site

โš–๏ธSelective (Block Training, Allow Search)

The most common stance: Your content won't train AI models, but it will be visible in AI search results.

Site TypeRecommended MessageWhy
News sitesTraining BlockedVisible in AI search, won't train models
Developer docsAI Search OKDevelopers find you, data safe
BlogsVisible in AI, Won't Train AIBest of both worlds

๐Ÿ›ก๏ธRestrictive (Block Training AND Search)

Full protection: No AI can scrape your content for any purpose.

Site TypeRecommended MessageWhy
Forums/communitiesWe Protect Our CommunityAppeals to contributors
Creative writingYour Content Stays YoursResonates with creators
Art platformsCreator-Safe PlatformArtist-focused messaging

โœ“Open (Allow Everything)

AI-friendly: You welcome AI tools and agents to access your content.

Site TypeRecommended MessageWhy
API documentationAI-ReadyDevelopers love this
Open source projectsAgent-FriendlyWelcomes AI tools
Public data sitesAvailable for AI TrainingClear permission

Frequently Asked Questions

Does blocking AI crawlers hurt my SEO?
No, blocking AI crawlers has no effect on your Google rankings. Googlebot (for search) and Google-Extended (for Gemini training) are separate bots. You can block Google-Extended while still being indexed by Googlebot. The same applies to Bing and other search engines.
Do AI crawlers actually respect robots.txt?
Major AI companies (OpenAI, Anthropic, Google, Microsoft) respect robots.txt for their official crawlers. However, some smaller or less scrupulous crawlers may not. We flag which bots are known to respect or ignore robots.txt in our database. For complete protection against all crawlers, you'd need server-level blocking.
Which AI bots should I block?
It depends on your goals. If you want to prevent training but stay visible in AI search, block training bots (GPTBot, ClaudeBot, Google-Extended) but allow search bots (ChatGPT-User, PerplexityBot). If you want complete protection, block all AI bots. If you want maximum AI visibility, allow everything. Use our robots.txt generator to create the right rules for your situation.
Why would I want to ALLOW AI crawlers?
Allowing AI crawlers means your content can appear in AI search results (like ChatGPT's browse mode or Perplexity), potentially driving traffic to your site. API documentation, open source projects, and public information sites often benefit from being AI-accessible. It's a trade-off between control and visibility.
What's the difference between GPTBot and ChatGPT-User?
GPTBot collects data to train future GPT models. Your content becomes part of the model's weights permanently. ChatGPT-User browses the web in real-time when users ask ChatGPT to search for current information. Blocking GPTBot prevents training; blocking ChatGPT-User prevents your site from appearing in ChatGPT's live search results.
How do I know if my blocking is working?
Use our robots.txt grader to analyze your current configuration. It shows exactly which AI bots you're blocking and which you're allowing. You can also enter your domain on our homepage to see your full AI policy analysis.
What about llms.txt? Is that different from robots.txt?
Yes, llms.txt is a newer standard that lets you provide structured information specifically for AI systems. While robots.txt blocks or allows crawlers, llms.txt can provide context, usage guidelines, and licensing information. Learn more at llmstxt.org.
Is this free?
Yes, all our tools are free: the robots.txt grader, robots.txt generator, and the verification badge. We analyze your site and generate badges at no cost.

Check Your Site's AI Policy

See which AI bots you're blocking or allowing. Get a free badge to display your stance.

How to Block AI Crawlers from Your Website (2026 Guide) | StackFox | StackFox