What Does StackFox Detect?

StackFox analyzes websites to discover their complete technology stack, infrastructure, security posture, and AI integrations.

🛠️ Free Tools

StackFox provides free tools to help you manage AI crawler policies for your website. No signup required.

📝 robots.txt Generator

Features

• Control 60+ AI crawlers with simple toggles
• Quick actions: Block training, block by company
• Top 5 most important bots highlighted
• Import existing robots.txt rules
• Export to multiple formats
• Verify deployment after publishing

Export Formats

• robots.txt - Standard format for any server
• nginx - Map directive for nginx servers
• Cloudflare - WAF Custom Rule expression
• Vercel/Next.js - Edge Middleware code
• Apache - .htaccess RewriteRules

Pro tip: After configuring rules, use the "Verify Deployment" feature to confirm your robots.txt is live and contains all your rules.

Open robots.txt Generator →

📊 robots.txt Grader

What It Does

• Analyzes your robots.txt against 60+ AI bots
• Grades your AI policy from A+ to F
• Shows blocked vs allowed bots by category
• Identifies missing important bot rules
• Provides actionable recommendations

How Grades Work

• A+ - Block training, allow search (recommended)
• A - Good training bot coverage
• B - Partial protection or full block
• C - Basic rules, incomplete coverage
• D - No AI-specific policy
• F - Blocks everything (hurts visibility)

Tip: Enter your domain to fetch your live robots.txt, or paste the content directly to analyze it.

Open robots.txt Grader →

🤖 AI Policy Tracking

StackFox tracks how websites handle AI crawlers and training data collection. We analyze both robots.txt and llms.txt to determine each site's AI policy.

What We Track

Which AI bots are explicitly blocked
Which AI bots are explicitly allowed
Training vs search bot distinctions
llms.txt file presence and contents

Why It Matters

Understand if your content trains AI models
See how competitors handle AI crawlers
Discover industry best practices
Make informed decisions about your policy

📋 robots.txt AI Bot Rules

We parse robots.txt files to identify rules for 60+ known AI crawlers:

Company	Training Bot	Search Bot	User Bot
OpenAI	GPTBot	OAI-SearchBot	ChatGPT-User
Anthropic	ClaudeBot	Claude-SearchBot	Claude-User
Google	Google-Extended	—	Gemini-*
Meta	meta-externalagent	—	Meta-ExternalFetcher
Apple	Applebot-Extended	—	—
Perplexity	—	PerplexityBot	Perplexity-User
ByteDance	Bytespider	—	—
Common Crawl	CCBot	—	—

Training Bots
Collect data for AI model training

Search Bots
Power AI-enhanced search results

User Bots
Fetch pages for user requests

Sources: OpenAI, Anthropic, Google, ai-robots-txt

📄 llms.txt Files

llms.txt is an emerging standard for websites to provide information to LLMs about their content.

What We Check

/llms.txt at site root
/.well-known/llms.txt
Content validation (not HTML 404s)
File size limits (under 100KB)

What We Store

Full file contents cached in R2
AI provider mentions detected
Permission declarations parsed
Last crawl timestamp

Note: llms.txt is informational only. Unlike robots.txt, it provides context FOR LLMs rather than blocking them. To block AI crawlers, use robots.txt directives instead.

📊 AI Bot Research & Insights

StackFox continuously crawls the web's top sites to generate industry-wide insights about AI crawler policies.

What We Crawl

Top 1000+ sites by traffic (Tranco list)
robots.txt from every domain
llms.txt at root and .well-known paths
Regular re-crawls for policy changes

Insights Generated

% of sites blocking each AI crawler
Most blocked bots by company
Training vs search bot distinctions
Industry trends over time

60+

AI Bots Tracked

1000+

Sites Crawled

AI Companies

Daily

Updates

Research Output: Check our Insights page for live statistics on AI bot blocking across the web. We also publish research reports on AI crawling trends.

📦 Technologies & Frameworks

We detect 4,000+ technologies across these categories:

Frontend

JavaScript frameworks (React, Vue, Angular, Svelte)
UI libraries (Tailwind, shadcn/ui, Chakra, MUI)
State management (Redux, Zustand, Jotai)
Build tools (Webpack, Vite, Turbopack)

Backend

Web frameworks (Next.js, Remix, Rails, Django)
Databases (PostgreSQL, MongoDB, Redis)
CMS platforms (WordPress, Contentful, Sanity)
Authentication (Auth0, Clerk, Supabase Auth)

Services

Analytics (Google Analytics, Plausible, Mixpanel)
Payment processors (Stripe, PayPal, Square)
Marketing tools (HubSpot, Mailchimp, Intercom)
CDNs (Cloudflare, Fastly, Vercel Edge)

Real-time

WebSocket services (Pusher, Ably, Socket.io)
Collaboration (Liveblocks, PartyKit)
Live features (Supabase Realtime, Phoenix)

Detection uses HTML patterns, JavaScript globals, HTTP headers, CSS variables, and network requests.

🤖 AI & Machine Learning

StackFox specializes in detecting AI integrations that other tools miss:

LLM Providers

OpenAI, Anthropic, Google AI, Mistral, Cohere, Groq, Together AI

AI Frameworks

LangChain, LlamaIndex, Vercel AI SDK, Hugging Face

Vector Databases

Pinecone, Weaviate, Chroma, Qdrant, Milvus

AI Chatbots

Intercom Fin, Drift, Zendesk AI, custom implementations

AI Observability

LangSmith, Helicone, Promptlayer, Weights & Biases

AI Infrastructure

Replicate, Modal, Baseten, RunPod, GPU clouds

We detect AI through API calls, SDK imports, chat widgets, and embedded models.

🔒 Security Headers

We analyze HTTP security headers and grade sites from A+ to F:

Header	Purpose
Strict-Transport-Security	Forces HTTPS connections
Content-Security-Policy	Prevents XSS and injection attacks
X-Frame-Options	Prevents clickjacking
X-Content-Type-Options	Prevents MIME sniffing
Referrer-Policy	Controls referrer information
Permissions-Policy	Controls browser features

A+ = All headers presentF = Critical headers missing

⚡ Performance Metrics

We measure real page load performance and grade sites A+ to F:

Load Time

Total time until page is fully loaded

DOM Content Loaded

Time until HTML is parsed and ready

First Paint

Time until first pixel is rendered

First Contentful Paint

Time until first content appears

Performance benchmarking compares your site against others with similar tech stacks.

🔧 DNS & Infrastructure

We query DNS records to identify infrastructure providers:

Email Provider

Google Workspace, Microsoft 365, Zoho, custom SMTP

Detected via MX records

DNS Provider

Cloudflare, Route53, Google Cloud DNS, DNSimple

Detected via NS records

Email Security

SPF, DKIM, DMARC policy detection

Detected via TXT records

Hosting

Vercel, Netlify, AWS, GCP, Render

Detected via CNAME/A records

🔌 API Services

We monitor network requests during page load to detect external services:

•Payment APIs: Stripe, PayPal, Square endpoints
•Auth APIs: Auth0, Clerk, Supabase authentication calls
•Analytics: Google Analytics, Segment, Amplitude events
•CDN/Assets: Cloudflare, Fastly, imgix image delivery
•WebSockets: Real-time connections for live features

🌐 Subdomains

We discover subdomains using Certificate Transparency logs:

When sites request SSL certificates, they're logged publicly. We query these logs to find subdomains like:

api.example.comapp.example.comstaging.example.comadmin.example.comdocs.example.com

This reveals the site's infrastructure architecture and services.

Ready to analyze a site?

See all of this in action.

Analyze a Site