๐ŸฆŠStackFox
Internet Archive logo

Internet Archive Bot

Tier 3
โ“ Unknownby Internet Archive โ†—ยท Since 1996

Crawls pages for the Wayback Machine web archive.

User-Agent Token
archive.org_bot
Respects robots.txt
Yes
Impact Level
Notable
10M+ users - Regional/specialized AI companies
Estimated Reach
10M+ users - Regional/specialized AI companies

๐ŸŽฏWhat is Internet Archive Bot?

Internet Archive Bot is a web crawler operated by Internet Archive. Purpose not documented.

๐Ÿ“Š How Your Data is Used

Non-profit preservation. Makes historical copies accessible to everyone.

๐Ÿšซ What Happens If You Block

Your site won't be preserved in the Wayback Machine.

๐Ÿ’ก Good to Know

Non-profit mission to preserve the web. Important for historical record.

๐ŸขAbout Internet Archive

Internet Archive logo
Internet Archive

Internet Archive operates 2 known bots for web crawling.

๐Ÿ›ก๏ธarchive.org_bot robots.txt Configuration

Control archive.org_bot access to your website using robots.txt directives.

Block archive.org_bot

To completely block Internet Archive Bot from crawling your site:

User-agent: archive.org_bot
Disallow: /

Allow archive.org_bot Full Access

To explicitly allow Internet Archive Bot to crawl your entire site:

User-agent: archive.org_bot
Allow: /

Selective Access for archive.org_bot

To allow Internet Archive Bot but restrict certain directories:

User-agent: archive.org_bot
Disallow: /private/
Disallow: /api/
Disallow: /admin/
Allow: /

โœ“ Internet Archive Bot respects robots.txt directives.

archive.org_bot User-Agent String

The user-agent token for Internet Archive Bot is:

archive.org_bot

๐Ÿ”—Other Internet Archive Bots

Check Your Site's AI Policy

See if you're blocking or allowing Internet Archive Bot and other AI crawlers.