Ai2Bot-Dolma
Tier 4Collects data for the Dolma dataset used to train open-source AI models.
Ai2Bot-Dolma๐ฏWhat is Ai2Bot-Dolma?
Ai2Bot-Dolma is an AI training crawler operated by Allen Institute for AI. Collects data to train AI models.
Dolma is a 3 trillion token open dataset. Powers OLMo and other open models.
Content excluded from Dolma dataset and models trained on it.
Part of AI2's commitment to open science. Dolma is fully documented and reproducible.
๐ขAbout Allen Institute for AI
Allen Institute for AI operates 3 known bots for AI model training.
๐ก๏ธAi2Bot-Dolma robots.txt Configuration
Control Ai2Bot-Dolma access to your website using robots.txt directives.
Block Ai2Bot-Dolma
To completely block Ai2Bot-Dolma from crawling your site:
User-agent: Ai2Bot-Dolma
Disallow: /Allow Ai2Bot-Dolma Full Access
To explicitly allow Ai2Bot-Dolma to crawl your entire site:
User-agent: Ai2Bot-Dolma
Allow: /Selective Access for Ai2Bot-Dolma
To allow Ai2Bot-Dolma but restrict certain directories:
User-agent: Ai2Bot-Dolma
Disallow: /private/
Disallow: /api/
Disallow: /admin/
Allow: /โ Ai2Bot-Dolma respects robots.txt directives.
Ai2Bot-Dolma User-Agent String
The user-agent token for Ai2Bot-Dolma is:
Ai2Bot-Dolma๐Who Blocks Ai2Bot-Dolma?
๐Other Allen Institute for AI Bots
Check Your Site's AI Policy
See if you're blocking or allowing Ai2Bot-Dolma and other AI crawlers.