๐ŸฆŠStackFox
Cohere logo

Cohere Training Crawler

Tier 3
๐Ÿ“š AI Trainingby Cohere โ†—ยท Since 2022

Collects LLM training data.

User-Agent Token
cohere-training-data-crawler
Respects robots.txt
Unknown
Impact Level
Notable
10M+ users - Regional/specialized AI companies
Estimated Reach
10M+ users - Regional/specialized AI companies

๐ŸŽฏWhat is Cohere Training Crawler?

Cohere Training Crawler is an AI training crawler operated by Cohere. Collects data to train AI models.

๐Ÿ“Š How Your Data is Used

Pre-training for Cohere's enterprise LLMs.

๐Ÿšซ What Happens If You Block

Content won't train Cohere's Command and Embed models.

๐Ÿ’ก Good to Know

Cohere focuses on enterprise. Competes with OpenAI/Anthropic in B2B.

๐ŸขAbout Cohere

Cohere logo
Cohere

Cohere operates 2 known bots for AI model training.

๐Ÿ›ก๏ธcohere-training-data-crawler robots.txt Configuration

Control cohere-training-data-crawler access to your website using robots.txt directives.

Block cohere-training-data-crawler

To completely block Cohere Training Crawler from crawling your site:

User-agent: cohere-training-data-crawler
Disallow: /

Allow cohere-training-data-crawler Full Access

To explicitly allow Cohere Training Crawler to crawl your entire site:

User-agent: cohere-training-data-crawler
Allow: /

Selective Access for cohere-training-data-crawler

To allow Cohere Training Crawler but restrict certain directories:

User-agent: cohere-training-data-crawler
Disallow: /private/
Disallow: /api/
Disallow: /admin/
Allow: /

cohere-training-data-crawler User-Agent String

The user-agent token for Cohere Training Crawler is:

cohere-training-data-crawler

๐Ÿ”—Other Cohere Bots

Check Your Site's AI Policy

See if you're blocking or allowing Cohere Training Crawler and other AI crawlers.