By @TheStackFox · February 2026
WebMCP: Your Website Is Now a Function Call
Google and Microsoft just shipped the first browser implementation of WebMCP, a proposed web standard that lets websites tell AI agents what they can do instead of making agents figure it out by staring at screenshots. I think this is a big deal. Here's why.
The problem it solves
AI agents interact with websites like a blindfolded person using a screen reader from 2004. They take screenshots, parse DOM trees, guess which button does what, and simulate clicks. Three approaches exist today, and none of them are great:
- Screenshot-based. Take a screenshot, feed it to a vision model, identify UI elements, click coordinates. Expensive in tokens, brittle across layouts, breaks every time a site redesigns.
- DOM scraping. Parse the HTML tree, find relevant elements, simulate interaction via Playwright or Puppeteer. Better than screenshots but still fragile. Class names change. SPAs load dynamically. Shadow DOMs hide things.
- Proprietary integrations. Build a bespoke agent API per platform. Works great until you need to support the other 200 million websites.
All three are reverse-engineering the UI to guess at what the application actually does. It's like reading a restaurant menu by photographing it and running OCR, when the waiter is standing right there and would happily just tell you what's on it.
What WebMCP actually is
WebMCP (Web Model Context Protocol) is a browser API being incubated at the W3C Web Machine Learning Community Group, jointly developed by engineers at Google and Microsoft. It lets web developers expose their app's functionality as structured, callable tools.
Put simply: a web page that implements WebMCP becomes an MCP server running in the browser tab.
Instead of an agent needing to identify input fields on a travel site, type into them, and click "Search," the site registers a searchFlights tool. The agent calls it with { origin: "SFO", destination: "JFK", date: "2026-03-15" } and gets structured results back.
The API lives at navigator.modelContext and comes in two flavors: declarative (HTML) and imperative (JavaScript).
Declarative: annotate your forms
If you already have HTML forms, you're most of the way there. Add a few attributes:
<form toolname="searchFlights"
tooldescription="Search available flights by route and date">
<input name="origin" type="text" required
toolparamdescription="Departure airport code (e.g. SFO)" />
<input name="destination" type="text" required
toolparamdescription="Arrival airport code (e.g. JFK)" />
<input name="date" type="date" required
toolparamdescription="Travel date" />
<button type="submit">Search</button>
</form>The browser reads the form structure and generates a tool schema from it. toolname and tooldescription are what the agent sees. Individual inputs can carry toolparamdescription for richer context than the field name alone. The existing name, type, and required attributes map to the schema. Your form still works for humans. It just also speaks agent now.
toolautosubmit lets agents submit without user confirmation. Without it, submission requires consent. The default is the safe path.
There are CSS pseudo-classes too (:tool-form-active, :tool-submit-active) for styling forms differently when an agent is driving, so users get visual feedback.
Imperative: full control via JavaScript
For anything beyond form submission, you register tools in JS:
if ('modelContext' in navigator) {
navigator.modelContext.registerTool({
name: 'add_to_cart',
description: 'Add a product to the shopping cart by product ID and quantity',
inputSchema: {
type: 'object',
properties: {
productId: {
type: 'string',
description: 'The unique identifier of the product',
},
quantity: {
type: 'number',
description: 'The number of items to add',
},
},
required: ['productId', 'quantity'],
},
async execute({ productId, quantity }, agent) {
const response = await fetch('/api/cart', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ productId, quantity }),
})
const result = await response.json()
return { content: [{ type: 'text', text: JSON.stringify(result) }] }
},
})
}The API surface is small. Three methods:
navigator.modelContext.registerTool(descriptor)adds a toolnavigator.modelContext.unregisterTool(name)removes onenavigator.modelContext.provideContext(options)replaces the entire registry at once
Each tool descriptor has a name, description, JSON Schema inputSchema, and an execute function. execute receives the parsed inputs and an agent object, returns a result with a content array (same shape as MCP responses).
The agent object has one method worth knowing: agent.requestUserInteraction(). It lets a tool pause and ask the user for confirmation mid-flow:
async execute({ amount }, agent) {
const confirmed = await agent.requestUserInteraction(async () => {
return confirm(`Transfer $${amount}?`);
});
if (!confirmed) {
return { content: [{ type: 'text', text: 'User cancelled transfer.' }] };
}
// proceed with transfer
}Tool calls execute sequentially. The browser runs one at a time, which prevents race conditions and gives users a window to intervene.
How this relates to Anthropic's MCP
This is where people get confused. The naming overlap is deliberate (WebMCP builds on the same conceptual foundation) but the two protocols solve different problems.
Anthropic's MCP connects agents to backend services. JSON-RPC over stdio or HTTP. Your MCP server runs on a server, exposes tools, agents call them remotely. It's how Claude talks to your database or your GitHub repo.
WebMCP connects agents to browser-based interfaces. Tools run as client-side JavaScript inside a real browser tab, with access to the DOM, the user's session, cookies, everything. No separate server needed.
| MCP | WebMCP | |
|---|---|---|
| Runs where | Server / backend | Browser tab |
| Transport | JSON-RPC (stdio/HTTP) | postMessage (in-browser) |
| Auth | API keys, OAuth | Existing browser session |
| Use case | Backend services, APIs, databases | Web applications, UIs |
| Implemented by | Backend developers | Frontend developers |
MCP is for when you want an agent to query your Postgres database. WebMCP is for when you want an agent to book a flight on United.com using your logged-in session.
Security
The browser mediates everything. Tool calls route through the browser, which can show the user what's being called and with what parameters, require consent before execution, enforce sequential execution (no parallel bulk actions), and distinguish agent actions from human ones via SubmitEvent.agentInvoked.
This is a big upgrade over the status quo, where agents with browser access can click anything with zero mediation.
Tools can carry a destructiveHint annotation for dangerous operations ("delete account," "transfer funds"). But it's advisory. The browser can prompt for confirmation, nothing requires it. A malicious site could just not mention that the tool deletes your account.
The spec acknowledges prompt injection as a real risk. The nightmare scenario: an agent reads your private data on one page, encounters attacker-controlled content, and gets tricked into calling a tool on another page to exfiltrate it. Credential handling, rate limiting, and abuse prevention are all left to individual implementations. The W3C group is working on these, but they're not solved.
Why you should care if you build web apps
Your existing web app becomes agent-accessible without building a separate API. The website is the interface. You don't need to stand up an MCP server alongside it.
You control what agents can do. Instead of agents scraping your site and guessing at capabilities (often wrong), you publish a contract. Here are the actions, here are the parameters, here's what you get back.
One codebase serves both humans and agents. Same checkout flow, same inventory check, same booking system. No drift between your "human API" and "agent API."
Auth comes free. Tools execute in the user's browser session. If they're logged in, the agent inherits that. No OAuth dance, no API keys. Anyone who's spent a week wiring up agent auth knows why this matters.
Who gets hurt, who benefits
If WebMCP gets traction, some existing markets compress and some new ones open up.
On the losing side: RPA vendors (UiPath, Automation Anywhere) built entire businesses on the brittleness of web UIs, which is to say, they monetized the problem WebMCP is trying to fix. If sites expose structured tools natively, pixel-level screen scraping loses its reason to exist. RPA won't disappear (legacy enterprise apps aren't adopting WebMCP anytime soon) but the growth ceiling comes down. Browser automation infra is in a similar spot. Companies like Browserless and BrowserBase sell headless browser capacity for AI agent web interaction. That market shrinks if agents can call addToCart() directly instead of navigating a checkout flow. Scraping-as-a-service faces a shift too, though I think it's the least affected. Scraping still matters for sites that don't adopt WebMCP, and most sites won't for a long time.
New opportunities: Someone needs to build agent commerce middleware: identity verification, transaction auditing, spend controls, dispute resolution for agent-initiated purchases. That category doesn't exist yet. There's also a play for WebMCP-as-a-service: most sites won't rewrite their frontend to add toolname attributes, so whoever builds tools that auto-generate tool registrations from existing site structure captures a real market. Agent analytics is another one. If SubmitEvent.agentInvoked tells you an agent submitted a form, you can now track agent vs. human conversion rates, funnel behavior, which tools agents actually call. Nobody has this data today. And then there's tool discovery and registries. The spec currently requires visiting a page to discover its tools. Whoever builds the canonical index of "which sites expose which tools" becomes essential infrastructure for the agent ecosystem. (This is something we're actively working on at StackFox.)
Farther out, a few things to watch. If an agent books your flight, it never saw the banner ad. Impression-based advertising on transactional pages loses value as agent-mediated transactions grow. Performance marketing (affiliate, CPA) may hold up since agents still complete conversions, but attribution gets weird when the "user" is an LLM. On the SEO front, if agents pick services based on tool schemas instead of search rankings, traditional SEO matters less for agent traffic. Writing good tool descriptions becomes its own discipline. SEO for function calls, basically. Expect "WebMCP Optimizer" SaaS products before year-end. And there's the platform concentration question. Google and Microsoft co-authoring the browser API for AI agent interaction gives them structural control over agent-web interaction. If Chrome is where agents run and WebMCP is how they interact, Google is the gatekeeper. Apple's silence is notable.
Where things stand (February 2026)
- Spec: W3C Community Group Draft Report. Not a standard. The API surface will change.
- Browser support: Chrome 146 DevTrial, behind the
Experimental Web Platform Featuresflag atchrome://flags. Stable release expected ~March 10, 2026. Separate Early Preview Program signup for docs and demos. - Other browsers: No public signals from Firefox or Safari.
- Polyfill: MCP-B is a reference implementation that polyfills
navigator.modelContexttoday and bridges to the MCP wire format. Start here if you want to experiment now. - Production ready? No. Don't ship this to users.
Google and Microsoft are the primary authors. The W3C Web Machine Learning Community Group is running formal review. Expect broader announcements at Google Cloud Next or I/O later this year.
Open questions
The chicken-and-egg problem is real. Agents won't prioritize WebMCP until sites implement it. Sites won't bother until agents use it. Chrome's market share (67-73%, depending on who you ask) gives it enough gravity to maybe force the issue, but cross-browser support will determine whether this becomes a real standard or another Chrome-only thing.
Tool discovery needs work. You have to visit a page to find out what tools it offers. There's talk of manifest files for out-of-band discovery, but nothing concrete. Without it, agents still need to navigate to pages before they know what's available, which undercuts the efficiency gains.
The declarative API is elegant but limited. Form annotation works for simple cases. Most interesting agent interactions will need the imperative API. Declarative is a good onramp, not the destination.
Privacy is the sleeper issue. SubmitEvent.agentInvoked explicitly tells sites that an agent is acting on your behalf. Some sites will use that flag to help you. Others will use it to charge you more, throttle you, or block you entirely. Airlines are probably already thinking about this.
My take
WebMCP applies the core insight of Anthropic's MCP (structured tool definitions beat unstructured text) to the browser. The web was built for humans to read and agents to struggle with. This gives developers a way to speak both languages from the same codebase.
Whether it actually wins depends on the usual web standards dynamics: cross-browser adoption, developer tooling, and whether the chicken-and-egg problem resolves before everyone moves on to the next thing. The backing from Google and Microsoft gives it a better shot than most proposals get.
If you build web applications, read the proposal and think about which of your features map to tool definitions. You don't need to ship anything today. But the teams that have a plan will move faster when this flag flips to default-on.