๐ŸฆŠStackFox
๐Ÿค–

Leipzig Corpora Collection

Tier 4

Academic linguistic corpus builder from University of Leipzig.

User-Agent Token
LCC
Respects robots.txt
Yes
Impact Level
Niche
Smaller players and developer tools
Estimated Reach
Smaller players and developer tools

๐ŸŽฏWhat is Leipzig Corpora Collection?

Leipzig Corpora Collection is an AI training crawler operated by University of Leipzig. Collects data to train AI models.

๐Ÿ“Š How Your Data is Used

Academic linguistic research and NLP model training.

๐Ÿšซ What Happens If You Block

Your content won't be included in linguistic research corpora.

๐Ÿ’ก Good to Know

Non-profit academic research. Data used for linguistic studies.

๐ŸขAbout University of Leipzig

University of Leipzig

University of Leipzig operates 1 known bot for AI model training.

๐Ÿ›ก๏ธLCC robots.txt Configuration

Control LCC access to your website using robots.txt directives.

Block LCC

To completely block Leipzig Corpora Collection from crawling your site:

User-agent: LCC
Disallow: /

Allow LCC Full Access

To explicitly allow Leipzig Corpora Collection to crawl your entire site:

User-agent: LCC
Allow: /

Selective Access for LCC

To allow Leipzig Corpora Collection but restrict certain directories:

User-agent: LCC
Disallow: /private/
Disallow: /api/
Disallow: /admin/
Allow: /

โœ“ Leipzig Corpora Collection respects robots.txt directives.

LCC User-Agent String

The user-agent token for Leipzig Corpora Collection is:

LCC

Check Your Site's AI Policy

See if you're blocking or allowing Leipzig Corpora Collection and other AI crawlers.