๐ŸฆŠStackFox

What Does StackFox Detect?

StackFox analyzes websites to discover their complete technology stack, infrastructure, security posture, and AI integrations.

๐Ÿ› ๏ธ Free Tools

StackFox provides free tools to help you manage AI crawler policies for your website. No signup required.

๐Ÿ“ robots.txt Generator

Features

  • โ€ข Control 60+ AI crawlers with simple toggles
  • โ€ข Quick actions: Block training, block by company
  • โ€ข Top 5 most important bots highlighted
  • โ€ข Import existing robots.txt rules
  • โ€ข Export to multiple formats
  • โ€ข Verify deployment after publishing

Export Formats

  • โ€ข robots.txt - Standard format for any server
  • โ€ข nginx - Map directive for nginx servers
  • โ€ข Cloudflare - WAF Custom Rule expression
  • โ€ข Vercel/Next.js - Edge Middleware code
  • โ€ข Apache - .htaccess RewriteRules
Pro tip: After configuring rules, use the "Verify Deployment" feature to confirm your robots.txt is live and contains all your rules.
Open robots.txt Generator โ†’

๐Ÿ“Š robots.txt Grader

What It Does

  • โ€ข Analyzes your robots.txt against 60+ AI bots
  • โ€ข Grades your AI policy from A+ to F
  • โ€ข Shows blocked vs allowed bots by category
  • โ€ข Identifies missing important bot rules
  • โ€ข Provides actionable recommendations

How Grades Work

  • โ€ข A+ - Block training, allow search (recommended)
  • โ€ข A - Good training bot coverage
  • โ€ข B - Partial protection or full block
  • โ€ข C - Basic rules, incomplete coverage
  • โ€ข D - No AI-specific policy
  • โ€ข F - Blocks everything (hurts visibility)
Tip: Enter your domain to fetch your live robots.txt, or paste the content directly to analyze it.
Open robots.txt Grader โ†’

๐Ÿค– AI Policy Tracking

StackFox tracks how websites handle AI crawlers and training data collection. We analyze both robots.txt and llms.txt to determine each site's AI policy.

What We Track

  • Which AI bots are explicitly blocked
  • Which AI bots are explicitly allowed
  • Training vs search bot distinctions
  • llms.txt file presence and contents

Why It Matters

  • Understand if your content trains AI models
  • See how competitors handle AI crawlers
  • Discover industry best practices
  • Make informed decisions about your policy

๐Ÿ“‹ robots.txt AI Bot Rules

We parse robots.txt files to identify rules for 60+ known AI crawlers:

CompanyTraining BotSearch BotUser Bot
OpenAIGPTBotOAI-SearchBotChatGPT-User
AnthropicClaudeBotClaude-SearchBotClaude-User
GoogleGoogle-Extendedโ€”Gemini-*
Metameta-externalagentโ€”Meta-ExternalFetcher
AppleApplebot-Extendedโ€”โ€”
Perplexityโ€”PerplexityBotPerplexity-User
ByteDanceBytespiderโ€”โ€”
Common CrawlCCBotโ€”โ€”
Training Bots
Collect data for AI model training
Search Bots
Power AI-enhanced search results
User Bots
Fetch pages for user requests

Sources: OpenAI, Anthropic, Google, ai-robots-txt

๐Ÿ“„ llms.txt Files

llms.txt is an emerging standard for websites to provide information to LLMs about their content.

What We Check

  • /llms.txt at site root
  • /.well-known/llms.txt
  • Content validation (not HTML 404s)
  • File size limits (under 100KB)

What We Store

  • Full file contents cached in R2
  • AI provider mentions detected
  • Permission declarations parsed
  • Last crawl timestamp
Note: llms.txt is informational only. Unlike robots.txt, it provides context FOR LLMs rather than blocking them. To block AI crawlers, use robots.txt directives instead.

๐Ÿ“Š AI Bot Research & Insights

StackFox continuously crawls the web's top sites to generate industry-wide insights about AI crawler policies.

What We Crawl

  • Top 1000+ sites by traffic (Tranco list)
  • robots.txt from every domain
  • llms.txt at root and .well-known paths
  • Regular re-crawls for policy changes

Insights Generated

  • % of sites blocking each AI crawler
  • Most blocked bots by company
  • Training vs search bot distinctions
  • Industry trends over time
60+
AI Bots Tracked
1000+
Sites Crawled
8
AI Companies
Daily
Updates
Research Output: Check our Insights page for live statistics on AI bot blocking across the web. We also publish research reports on AI crawling trends.

๐Ÿ“ฆ Technologies & Frameworks

We detect 4,000+ technologies across these categories:

Frontend

  • JavaScript frameworks (React, Vue, Angular, Svelte)
  • UI libraries (Tailwind, shadcn/ui, Chakra, MUI)
  • State management (Redux, Zustand, Jotai)
  • Build tools (Webpack, Vite, Turbopack)

Backend

  • Web frameworks (Next.js, Remix, Rails, Django)
  • Databases (PostgreSQL, MongoDB, Redis)
  • CMS platforms (WordPress, Contentful, Sanity)
  • Authentication (Auth0, Clerk, Supabase Auth)

Services

  • Analytics (Google Analytics, Plausible, Mixpanel)
  • Payment processors (Stripe, PayPal, Square)
  • Marketing tools (HubSpot, Mailchimp, Intercom)
  • CDNs (Cloudflare, Fastly, Vercel Edge)

Real-time

  • WebSocket services (Pusher, Ably, Socket.io)
  • Collaboration (Liveblocks, PartyKit)
  • Live features (Supabase Realtime, Phoenix)

Detection uses HTML patterns, JavaScript globals, HTTP headers, CSS variables, and network requests.

๐Ÿค– AI & Machine Learning

StackFox specializes in detecting AI integrations that other tools miss:

LLM Providers

OpenAI, Anthropic, Google AI, Mistral, Cohere, Groq, Together AI

AI Frameworks

LangChain, LlamaIndex, Vercel AI SDK, Hugging Face

Vector Databases

Pinecone, Weaviate, Chroma, Qdrant, Milvus

AI Chatbots

Intercom Fin, Drift, Zendesk AI, custom implementations

AI Observability

LangSmith, Helicone, Promptlayer, Weights & Biases

AI Infrastructure

Replicate, Modal, Baseten, RunPod, GPU clouds

We detect AI through API calls, SDK imports, chat widgets, and embedded models.

๐Ÿ”’ Security Headers

We analyze HTTP security headers and grade sites from A+ to F:

HeaderPurpose
Strict-Transport-SecurityForces HTTPS connections
Content-Security-PolicyPrevents XSS and injection attacks
X-Frame-OptionsPrevents clickjacking
X-Content-Type-OptionsPrevents MIME sniffing
Referrer-PolicyControls referrer information
Permissions-PolicyControls browser features
A+ = All headers presentF = Critical headers missing

โšก Performance Metrics

We measure real page load performance and grade sites A+ to F:

Load Time

Total time until page is fully loaded

DOM Content Loaded

Time until HTML is parsed and ready

First Paint

Time until first pixel is rendered

First Contentful Paint

Time until first content appears

Performance benchmarking compares your site against others with similar tech stacks.

๐Ÿ”ง DNS & Infrastructure

We query DNS records to identify infrastructure providers:

Email Provider

Google Workspace, Microsoft 365, Zoho, custom SMTP

Detected via MX records

DNS Provider

Cloudflare, Route53, Google Cloud DNS, DNSimple

Detected via NS records

Email Security

SPF, DKIM, DMARC policy detection

Detected via TXT records

Hosting

Vercel, Netlify, AWS, GCP, Render

Detected via CNAME/A records

๐Ÿ”Œ API Services

We monitor network requests during page load to detect external services:

  • โ€ขPayment APIs: Stripe, PayPal, Square endpoints
  • โ€ขAuth APIs: Auth0, Clerk, Supabase authentication calls
  • โ€ขAnalytics: Google Analytics, Segment, Amplitude events
  • โ€ขCDN/Assets: Cloudflare, Fastly, imgix image delivery
  • โ€ขWebSockets: Real-time connections for live features

๐ŸŒ Subdomains

We discover subdomains using Certificate Transparency logs:

When sites request SSL certificates, they're logged publicly. We query these logs to find subdomains like:

api.example.comapp.example.comstaging.example.comadmin.example.comdocs.example.com

This reveals the site's infrastructure architecture and services.

Ready to analyze a site?

See all of this in action.

Analyze a Site
Documentation | StackFox | StackFox