GPTBot
Tier 1Collects data for training GPT models. Blocking prevents content from being used in future model training.
GPTBot๐ฏWhat is GPTBot?
GPTBot is an AI training crawler operated by OpenAI. Collects data to train AI models.
Pre-training data for foundation models. Content becomes part of model weights permanently.
Your content won't train future GPT models (GPT-5, etc). Does NOT affect current ChatGPT answers.
Most blocked AI bot. ~35% of top sites block GPTBot. OpenAI has licensing deals with major publishers.
๐ขAbout OpenAI
OpenAI operates 5 known bots for AI model training. Their service reaches 300M+ weekly active users.
๐ก๏ธGPTBot robots.txt Configuration
Control GPTBot access to your website using robots.txt directives.
Block GPTBot
To completely block GPTBot from crawling your site:
User-agent: GPTBot
Disallow: /Allow GPTBot Full Access
To explicitly allow GPTBot to crawl your entire site:
User-agent: GPTBot
Allow: /Selective Access for GPTBot
To allow GPTBot but restrict certain directories:
User-agent: GPTBot
Disallow: /private/
Disallow: /api/
Disallow: /admin/
Allow: /โ GPTBot respects robots.txt directives.
GPTBot User-Agent String
The user-agent token for GPTBot is:
GPTBot๐Who Blocks GPTBot?
๐Other OpenAI Bots
Check Your Site's AI Policy
See if you're blocking or allowing GPTBot and other AI crawlers.