Bytespider
Tier 2ByteDance crawler for LLM training. Known to ignore robots.txt.
Bytespider๐ฏWhat is Bytespider?
Bytespider is an AI training crawler operated by ByteDance. Collects data to train AI models.
Pre-training for ByteDance AI models including Doubao.
Robots.txt blocking may be IGNORED. Consider IP-level blocking.
CONTROVERSIAL: Known to ignore robots.txt. Aggressive crawling. Chinese company - some sites block for geopolitical reasons.
๐ขAbout ByteDance
ByteDance operates 3 known bots for AI model training. Their service reaches 1B+ TikTok users, Doubao AI assistant.
๐ก๏ธBytespider robots.txt Configuration
Control Bytespider access to your website using robots.txt directives.
Block Bytespider
To completely block Bytespider from crawling your site:
User-agent: Bytespider
Disallow: /Allow Bytespider Full Access
To explicitly allow Bytespider to crawl your entire site:
User-agent: Bytespider
Allow: /Selective Access for Bytespider
To allow Bytespider but restrict certain directories:
User-agent: Bytespider
Disallow: /private/
Disallow: /api/
Disallow: /admin/
Allow: /โ Bytespider may not fully respect robots.txt. Consider additional server-side controls if needed.
Bytespider User-Agent String
The user-agent token for Bytespider is:
Bytespider๐Who Blocks Bytespider?
๐Other ByteDance Bots
Check Your Site's AI Policy
See if you're blocking or allowing Bytespider and other AI crawlers.