Internet Archive Bot
Tier 3Crawls pages for the Wayback Machine web archive.
archive.org_bot๐ฏWhat is Internet Archive Bot?
Internet Archive Bot is a web crawler operated by Internet Archive. Purpose not documented.
Non-profit preservation. Makes historical copies accessible to everyone.
Your site won't be preserved in the Wayback Machine.
Non-profit mission to preserve the web. Important for historical record.
๐ขAbout Internet Archive
Internet Archive operates 2 known bots for web crawling.
๐ก๏ธarchive.org_bot robots.txt Configuration
Control archive.org_bot access to your website using robots.txt directives.
Block archive.org_bot
To completely block Internet Archive Bot from crawling your site:
User-agent: archive.org_bot
Disallow: /Allow archive.org_bot Full Access
To explicitly allow Internet Archive Bot to crawl your entire site:
User-agent: archive.org_bot
Allow: /Selective Access for archive.org_bot
To allow Internet Archive Bot but restrict certain directories:
User-agent: archive.org_bot
Disallow: /private/
Disallow: /api/
Disallow: /admin/
Allow: /โ Internet Archive Bot respects robots.txt directives.
archive.org_bot User-Agent String
The user-agent token for Internet Archive Bot is:
archive.org_bot๐Who Blocks Internet Archive Bot?
๐Other Internet Archive Bots
Check Your Site's AI Policy
See if you're blocking or allowing Internet Archive Bot and other AI crawlers.