๐ŸฆŠStackFox
ByteDance logo

Bytespider

Tier 2
๐Ÿ“š AI Trainingby ByteDance โ†—ยท Since 2019

ByteDance crawler for LLM training. Known to ignore robots.txt.

User-Agent Token
Bytespider
Respects robots.txt
No
Impact Level
Major
100M+ users - Meta, Apple, Microsoft, Perplexity, xAI
Estimated Reach
1B+ TikTok users, Doubao AI assistant

๐ŸŽฏWhat is Bytespider?

Bytespider is an AI training crawler operated by ByteDance. Collects data to train AI models.

๐Ÿ“Š How Your Data is Used

Pre-training for ByteDance AI models including Doubao.

๐Ÿšซ What Happens If You Block

Robots.txt blocking may be IGNORED. Consider IP-level blocking.

๐Ÿ’ก Good to Know

CONTROVERSIAL: Known to ignore robots.txt. Aggressive crawling. Chinese company - some sites block for geopolitical reasons.

๐ŸขAbout ByteDance

ByteDance logo
ByteDance

ByteDance operates 3 known bots for AI model training. Their service reaches 1B+ TikTok users, Doubao AI assistant.

๐Ÿ›ก๏ธBytespider robots.txt Configuration

Control Bytespider access to your website using robots.txt directives.

Block Bytespider

To completely block Bytespider from crawling your site:

User-agent: Bytespider
Disallow: /

Allow Bytespider Full Access

To explicitly allow Bytespider to crawl your entire site:

User-agent: Bytespider
Allow: /

Selective Access for Bytespider

To allow Bytespider but restrict certain directories:

User-agent: Bytespider
Disallow: /private/
Disallow: /api/
Disallow: /admin/
Allow: /

โš  Bytespider may not fully respect robots.txt. Consider additional server-side controls if needed.

Bytespider User-Agent String

The user-agent token for Bytespider is:

Bytespider

Check Your Site's AI Policy

See if you're blocking or allowing Bytespider and other AI crawlers.