๐ŸฆŠStackFox
Allen Institute for AI logo

Ai2Bot-Dolma

Tier 4
๐Ÿ“š AI Trainingby Allen Institute for AI โ†—ยท Since 2023

Collects data for the Dolma dataset used to train open-source AI models.

User-Agent Token
Ai2Bot-Dolma
Respects robots.txt
Yes
Impact Level
Niche
Smaller players and developer tools
Estimated Reach
Smaller players and developer tools

๐ŸŽฏWhat is Ai2Bot-Dolma?

Ai2Bot-Dolma is an AI training crawler operated by Allen Institute for AI. Collects data to train AI models.

๐Ÿ“Š How Your Data is Used

Dolma is a 3 trillion token open dataset. Powers OLMo and other open models.

๐Ÿšซ What Happens If You Block

Content excluded from Dolma dataset and models trained on it.

๐Ÿ’ก Good to Know

Part of AI2's commitment to open science. Dolma is fully documented and reproducible.

๐ŸขAbout Allen Institute for AI

Allen Institute for AI logo
Allen Institute for AI

Allen Institute for AI operates 3 known bots for AI model training.

๐Ÿ›ก๏ธAi2Bot-Dolma robots.txt Configuration

Control Ai2Bot-Dolma access to your website using robots.txt directives.

Block Ai2Bot-Dolma

To completely block Ai2Bot-Dolma from crawling your site:

User-agent: Ai2Bot-Dolma
Disallow: /

Allow Ai2Bot-Dolma Full Access

To explicitly allow Ai2Bot-Dolma to crawl your entire site:

User-agent: Ai2Bot-Dolma
Allow: /

Selective Access for Ai2Bot-Dolma

To allow Ai2Bot-Dolma but restrict certain directories:

User-agent: Ai2Bot-Dolma
Disallow: /private/
Disallow: /api/
Disallow: /admin/
Allow: /

โœ“ Ai2Bot-Dolma respects robots.txt directives.

Ai2Bot-Dolma User-Agent String

The user-agent token for Ai2Bot-Dolma is:

Ai2Bot-Dolma

Check Your Site's AI Policy

See if you're blocking or allowing Ai2Bot-Dolma and other AI crawlers.