๐ŸฆŠStackFox
๐Ÿค–

Scrapy

Tier 4

Open-source web scraping framework. User-agent when using default settings.

User-Agent Token
Scrapy
Respects robots.txt
Unknown
Impact Level
Niche
Smaller players and developer tools
Estimated Reach
Smaller players and developer tools

๐ŸŽฏWhat is Scrapy?

Scrapy is a web crawler operated by Scrapy Project. Purpose not documented.

๐Ÿ“Š How Your Data is Used

Framework used by developers. Purpose varies by implementation.

๐Ÿšซ What Happens If You Block

Blocks default Scrapy crawlers (users can change user-agent).

๐Ÿ’ก Good to Know

Popular Python scraping framework. Blocking only affects default user-agent.

๐ŸขAbout Scrapy Project

Scrapy Project

Scrapy Project operates 1 known bot for web crawling.

๐Ÿ›ก๏ธScrapy robots.txt Configuration

Control Scrapy access to your website using robots.txt directives.

Block Scrapy

To completely block Scrapy from crawling your site:

User-agent: Scrapy
Disallow: /

Allow Scrapy Full Access

To explicitly allow Scrapy to crawl your entire site:

User-agent: Scrapy
Allow: /

Selective Access for Scrapy

To allow Scrapy but restrict certain directories:

User-agent: Scrapy
Disallow: /private/
Disallow: /api/
Disallow: /admin/
Allow: /

Scrapy User-Agent String

The user-agent token for Scrapy is:

Scrapy

Check Your Site's AI Policy

See if you're blocking or allowing Scrapy and other AI crawlers.