๐ค
Scrapy
Tier 4โ Unknownby Scrapy Project โ
Open-source web scraping framework. User-agent when using default settings.
User-Agent Token
ScrapyRespects robots.txt
UnknownImpact Level
Niche
Smaller players and developer tools
Estimated Reach
Smaller players and developer tools
๐ฏWhat is Scrapy?
Scrapy is a web crawler operated by Scrapy Project. Purpose not documented.
๐ How Your Data is Used
Framework used by developers. Purpose varies by implementation.
๐ซ What Happens If You Block
Blocks default Scrapy crawlers (users can change user-agent).
๐ก Good to Know
Popular Python scraping framework. Blocking only affects default user-agent.
๐ขAbout Scrapy Project
๐ก๏ธScrapy robots.txt Configuration
Control Scrapy access to your website using robots.txt directives.
Block Scrapy
To completely block Scrapy from crawling your site:
User-agent: Scrapy
Disallow: /Allow Scrapy Full Access
To explicitly allow Scrapy to crawl your entire site:
User-agent: Scrapy
Allow: /Selective Access for Scrapy
To allow Scrapy but restrict certain directories:
User-agent: Scrapy
Disallow: /private/
Disallow: /api/
Disallow: /admin/
Allow: /Scrapy User-Agent String
The user-agent token for Scrapy is:
Scrapy๐Who Blocks Scrapy?
Blocking (49 sites)
01net.com10best.com15min.lt1881.no20min.ch24chasa.bg3dvf.com47news.jp4p.de4players.de588ku.com8newsnow.com9news.com.auabc27.comabc4.coma.coadm-nao.ruadvrider.comadweek.comafr.comaftenbladet.noaftenposten.noaftonbladet.seaftvnews.comagrigentonotizie.italfemminile.comallabolag.seallgaeuer-zeitung.deall-in.deallofapps.comalltheweb.comalrosa.rualtavista.comamac.usamarillo.comamazon.aeamazonalexa.comamazon.caamazon.clamazon.co.jpamazon.comamazon.com.auamazon.com.bramazon.com.coamazon.com.mxamazon.com.tramazon.co.ukamazon.co.zaamazon.de
Explicitly Allowing (1 sites)
Check Your Site's AI Policy
See if you're blocking or allowing Scrapy and other AI crawlers.