What Does StackFox Detect?
StackFox analyzes websites to discover their complete technology stack, infrastructure, security posture, and AI integrations.
On This Page
๐ ๏ธ Free Tools
StackFox provides free tools to help you manage AI crawler policies for your website. No signup required.
๐ robots.txt Generator
Features
- โข Control 60+ AI crawlers with simple toggles
- โข Quick actions: Block training, block by company
- โข Top 5 most important bots highlighted
- โข Import existing robots.txt rules
- โข Export to multiple formats
- โข Verify deployment after publishing
Export Formats
- โข robots.txt - Standard format for any server
- โข nginx - Map directive for nginx servers
- โข Cloudflare - WAF Custom Rule expression
- โข Vercel/Next.js - Edge Middleware code
- โข Apache - .htaccess RewriteRules
๐ robots.txt Grader
What It Does
- โข Analyzes your robots.txt against 60+ AI bots
- โข Grades your AI policy from A+ to F
- โข Shows blocked vs allowed bots by category
- โข Identifies missing important bot rules
- โข Provides actionable recommendations
How Grades Work
- โข A+ - Block training, allow search (recommended)
- โข A - Good training bot coverage
- โข B - Partial protection or full block
- โข C - Basic rules, incomplete coverage
- โข D - No AI-specific policy
- โข F - Blocks everything (hurts visibility)
๐ค AI Policy Tracking
StackFox tracks how websites handle AI crawlers and training data collection. We analyze both robots.txt and llms.txt to determine each site's AI policy.
What We Track
- Which AI bots are explicitly blocked
- Which AI bots are explicitly allowed
- Training vs search bot distinctions
- llms.txt file presence and contents
Why It Matters
- Understand if your content trains AI models
- See how competitors handle AI crawlers
- Discover industry best practices
- Make informed decisions about your policy
๐ robots.txt AI Bot Rules
We parse robots.txt files to identify rules for 60+ known AI crawlers:
| Company | Training Bot | Search Bot | User Bot |
|---|---|---|---|
| OpenAI | GPTBot | OAI-SearchBot | ChatGPT-User |
| Anthropic | ClaudeBot | Claude-SearchBot | Claude-User |
| Google-Extended | โ | Gemini-* | |
| Meta | meta-externalagent | โ | Meta-ExternalFetcher |
| Apple | Applebot-Extended | โ | โ |
| Perplexity | โ | PerplexityBot | Perplexity-User |
| ByteDance | Bytespider | โ | โ |
| Common Crawl | CCBot | โ | โ |
Collect data for AI model training
Power AI-enhanced search results
Fetch pages for user requests
Sources: OpenAI, Anthropic, Google, ai-robots-txt
๐ llms.txt Files
llms.txt is an emerging standard for websites to provide information to LLMs about their content.
What We Check
/llms.txtat site root/.well-known/llms.txt- Content validation (not HTML 404s)
- File size limits (under 100KB)
What We Store
- Full file contents cached in R2
- AI provider mentions detected
- Permission declarations parsed
- Last crawl timestamp
๐ AI Bot Research & Insights
StackFox continuously crawls the web's top sites to generate industry-wide insights about AI crawler policies.
What We Crawl
- Top 1000+ sites by traffic (Tranco list)
- robots.txt from every domain
- llms.txt at root and .well-known paths
- Regular re-crawls for policy changes
Insights Generated
- % of sites blocking each AI crawler
- Most blocked bots by company
- Training vs search bot distinctions
- Industry trends over time
๐ฆ Technologies & Frameworks
We detect 4,000+ technologies across these categories:
Frontend
- JavaScript frameworks (React, Vue, Angular, Svelte)
- UI libraries (Tailwind, shadcn/ui, Chakra, MUI)
- State management (Redux, Zustand, Jotai)
- Build tools (Webpack, Vite, Turbopack)
Backend
- Web frameworks (Next.js, Remix, Rails, Django)
- Databases (PostgreSQL, MongoDB, Redis)
- CMS platforms (WordPress, Contentful, Sanity)
- Authentication (Auth0, Clerk, Supabase Auth)
Services
- Analytics (Google Analytics, Plausible, Mixpanel)
- Payment processors (Stripe, PayPal, Square)
- Marketing tools (HubSpot, Mailchimp, Intercom)
- CDNs (Cloudflare, Fastly, Vercel Edge)
Real-time
- WebSocket services (Pusher, Ably, Socket.io)
- Collaboration (Liveblocks, PartyKit)
- Live features (Supabase Realtime, Phoenix)
Detection uses HTML patterns, JavaScript globals, HTTP headers, CSS variables, and network requests.
๐ค AI & Machine Learning
StackFox specializes in detecting AI integrations that other tools miss:
LLM Providers
OpenAI, Anthropic, Google AI, Mistral, Cohere, Groq, Together AI
AI Frameworks
LangChain, LlamaIndex, Vercel AI SDK, Hugging Face
Vector Databases
Pinecone, Weaviate, Chroma, Qdrant, Milvus
AI Chatbots
Intercom Fin, Drift, Zendesk AI, custom implementations
AI Observability
LangSmith, Helicone, Promptlayer, Weights & Biases
AI Infrastructure
Replicate, Modal, Baseten, RunPod, GPU clouds
We detect AI through API calls, SDK imports, chat widgets, and embedded models.
๐ Security Headers
We analyze HTTP security headers and grade sites from A+ to F:
| Header | Purpose |
|---|---|
| Strict-Transport-Security | Forces HTTPS connections |
| Content-Security-Policy | Prevents XSS and injection attacks |
| X-Frame-Options | Prevents clickjacking |
| X-Content-Type-Options | Prevents MIME sniffing |
| Referrer-Policy | Controls referrer information |
| Permissions-Policy | Controls browser features |
โก Performance Metrics
We measure real page load performance and grade sites A+ to F:
Load Time
Total time until page is fully loaded
DOM Content Loaded
Time until HTML is parsed and ready
First Paint
Time until first pixel is rendered
First Contentful Paint
Time until first content appears
Performance benchmarking compares your site against others with similar tech stacks.
๐ง DNS & Infrastructure
We query DNS records to identify infrastructure providers:
Email Provider
Google Workspace, Microsoft 365, Zoho, custom SMTP
Detected via MX records
DNS Provider
Cloudflare, Route53, Google Cloud DNS, DNSimple
Detected via NS records
Email Security
SPF, DKIM, DMARC policy detection
Detected via TXT records
Hosting
Vercel, Netlify, AWS, GCP, Render
Detected via CNAME/A records
๐ API Services
We monitor network requests during page load to detect external services:
- โขPayment APIs: Stripe, PayPal, Square endpoints
- โขAuth APIs: Auth0, Clerk, Supabase authentication calls
- โขAnalytics: Google Analytics, Segment, Amplitude events
- โขCDN/Assets: Cloudflare, Fastly, imgix image delivery
- โขWebSockets: Real-time connections for live features
๐ Subdomains
We discover subdomains using Certificate Transparency logs:
When sites request SSL certificates, they're logged publicly. We query these logs to find subdomains like:
This reveals the site's infrastructure architecture and services.