๐ŸฆŠStackFox
Diffbot logo

Diffbot

Tier 3
๐Ÿ“š AI Trainingby Diffbot โ†—ยท Since 2012

Structured data extraction and AI training.

User-Agent Token
Diffbot
Respects robots.txt
Yes
Impact Level
Notable
10M+ users - Regional/specialized AI companies
Estimated Reach
Customers: Microsoft, Adobe, Cisco, eBay, DuckDuckGo, Avast

๐ŸŽฏWhat is Diffbot?

Diffbot is an AI training crawler operated by Diffbot. Collects data to train AI models.

๐Ÿ“Š How Your Data is Used

World's largest commercial knowledge graph. Sells entity data to enterprises for competitive intelligence, lead gen, and AI training.

๐Ÿšซ What Happens If You Block

Content won't appear in Diffbot Knowledge Graph (10B+ entities). Affects DuckDuckGo search, Avast security ratings, enterprise market intel.

๐Ÿ’ก Good to Know

Enterprise customers include Microsoft, Adobe, Cisco, eBay, DuckDuckGo. Also used by AI startups like RelationalAI. Powers anti-misinformation initiatives.

๐ŸขAbout Diffbot

Diffbot logo
Diffbot

Diffbot operates 1 known bot for AI model training. Their service reaches Customers: Microsoft, Adobe, Cisco, eBay, DuckDuckGo, Avast.

๐Ÿ›ก๏ธDiffbot robots.txt Configuration

Control Diffbot access to your website using robots.txt directives.

Block Diffbot

To completely block Diffbot from crawling your site:

User-agent: Diffbot
Disallow: /

Allow Diffbot Full Access

To explicitly allow Diffbot to crawl your entire site:

User-agent: Diffbot
Allow: /

Selective Access for Diffbot

To allow Diffbot but restrict certain directories:

User-agent: Diffbot
Disallow: /private/
Disallow: /api/
Disallow: /admin/
Allow: /

โœ“ Diffbot respects robots.txt directives.

Diffbot User-Agent String

The user-agent token for Diffbot is:

Diffbot

Check Your Site's AI Policy

See if you're blocking or allowing Diffbot and other AI crawlers.