An engineer opens Dawn and asks: "Scrape https://topstartups.io/ and give me the first 10 startups, including names, descriptions, HQ, year founded, URLs, social pages, formatted as a table."
The agent thinks for a moment, fetches the page, parses the listings, follows each startup's profile, and returns the table. Ten rows. Every column populated. Pogo, Auctor, Scalify, Omnea, Rivan, Listen Labs, Doppel, Blossom, Avoca, Traba. HQs across Brooklyn, New York, London, San Francisco, Remote. LinkedIn for most. Founded years 2020 to 2026.
That table was the output of a handful of FourA calls.
This week Dawn shipped FourA as a first-class tool inside their agent platform. It sits in their integrations grid next to Notion, GitHub, and Google Drive. Agents granted FourA access can fetch a public web page or HTTP endpoint, parse the response (including JSON), submit a form, check reachability, and pull specific text or links out of what comes back. Each agent has explicit access or it doesn't. Per-agent governance, no "every agent gets the internet" footgun.
What's interesting isn't that an agent can hit a URL. Web search has existed in agent platforms for a year. What's interesting is the shape of the tool that's emerging.
Web search and URL extraction are different jobs. Search is for "what does the internet say about X?" Broad, generative, summary-grade information. Extraction is for "here is the URL or endpoint, fetch it and give me the structured answer." Different reliability requirements, different cost profiles, different failure modes. Mixing them in one tool produces a mediocre answer for both.
Dawn's integration treats them as separate. They have a /web-research capability for the broad job. FourA is for the targeted job. An agent reaches for the right tool based on what it actually needs. And that's the maturation pattern we're starting to see across agent platforms in 2026: extraction is graduating from "search bolted-on" to its own primitive.
For the platform engineer reading this
Dawn exposes FourA as eight named tools, each mapping to a common extraction pattern:
foura_fetch_pagefor HTML and text pagesfoura_extract_textfor clean readable contentfoura_extract_linksfor navigation, forms, scripts, and stylesfoura_fetch_jsonfor API endpointsfoura_head_urlfor headers, status, redirectsfoura_probe_sitefor fast reachability checksfoura_submit_formfor login-free form submissionsfoura_single_requestfor arbitrary HTTP
The agent picks based on what the question demands. The topstartups query above used three of them in sequence: a fetch, an extract, a follow-up.
The integration is straightforward enough to do in a day. Two request flavors sit underneath: a direct mode with browser-grade fingerprinting for sites that don't aggressively gate, and a proxy-routed mode for everything else. Both share the same request shape: URL, optional headers and body, optional response parsing. The agent picks based on what the target site demands.
The contract a platform offers its agents tends to look like:
- A small set of capabilities (fetch / extract / probe / submit), each with a focused tool definition the agent can reach for
- Default to proxy mode, fall back to direct when latency or cost matters
- Per-agent permission so the platform's customers retain governance
- Structured response parsing exposed as a tool param, not buried in a system prompt
But the part most platform engineers underestimate is what happens at the tail. The 80% case (a fetch succeeds in 200ms, returns clean HTML) is the easy half. The other 20% (sites that gate on TLS fingerprint, that race a JS challenge into the response, that 403 on a cloud IP block) is what determines whether your agent ships a correct answer or a hallucinated one. We rebuilt our request path for exactly that tail, and the difference between "feels reliable" and "actually reliable" is most of the work.
So if you run an agent platform and your customers keep asking how their agents could "just check this URL," that's the pattern. Docs are at /docs. We're happy to walk you through it.
For everyone else
You won't see any of this. You'll just notice that when you ask an AI assistant a question that requires looking at a real web page right now, it answers correctly instead of guessing or apologising.
That's the user-facing outcome of an extraction primitive reliable enough to live next to GitHub and Google Drive in an integrations grid. It stops being a research project. It starts being plumbing.
Why this matters
Six months ago, an agent that needed to read a webpage was a custom build. Bespoke prompts, brittle scrapers, hand-rolled retries, a 60% success rate on a good day. The shape was wrong because the layer didn't exist yet. And the sites the agent was hitting kept moving. Anti-bot tech shifted from static signals to behavioral checks, so the duct-taped scrapers degraded faster than teams could patch them.
Now the layer is forming. Dawn picked it up and shipped an integration. We expect more agent platforms to follow this year, and we expect the contract to converge: a dedicated tool for search, a dedicated tool for extraction, per-agent governance, predictable cost.
We're early. But this is what the rise of something looks like. When a capability stops being a project and starts being a plug.
If you build an agent platform and want to ship the same shape, say hello. If you build agents on Dawn, FourA is already there. Toggle it on.