← All posts

FourA shows up in Dawn — and that's the rise of something

Dawn shipped a FourA integration this week. Behind every agent answer that touches the live web, there's now an extraction call. Here's the shape that's emerging.

An engineer opens Dawn and asks: "Scrape https://topstartups.io/ and give me the first 10 startups, including names, descriptions, HQ, year founded, URLs, social pages, formatted as a table."

The agent thinks for a moment, fetches the page, parses the listings, follows each startup's profile, and returns the table. Ten rows. Every column populated. Pogo, Auctor, Scalify, Omnea, Rivan, Listen Labs, Doppel, Blossom, Avoca, Traba. HQs across Brooklyn, New York, London, San Francisco, Remote. LinkedIn for most. Founded years 2020 to 2026.

That table was the output of a handful of FourA calls.

This week Dawn shipped FourA as a first-class tool inside their agent platform. It sits in their integrations grid next to Notion, GitHub, and Google Drive. Agents granted FourA access can fetch a public web page or HTTP endpoint, parse the response (including JSON), submit a form, check reachability, and pull specific text or links out of what comes back. Each agent has explicit access or it doesn't. Per-agent governance, no "every agent gets the internet" footgun.

FourA in Dawn's integrations grid, alongside OneDrive, MailJet, Linear, Jira, and Trello FourA in Dawn's integrations grid, alongside OneDrive, MailJet, Linear, Jira, and Trello

What's interesting isn't that an agent can hit a URL. Web search has existed in agent platforms for a year. What's interesting is the shape of the tool that's emerging.

Web search and URL extraction are different jobs. Search is for "what does the internet say about X?" Broad, generative, summary-grade information. Extraction is for "here is the URL or endpoint, fetch it and give me the structured answer." Different reliability requirements, different cost profiles, different failure modes. Mixing them in one tool produces a mediocre answer for both.

Dawn's integration treats them as separate. They have a /web-research capability for the broad job. FourA is for the targeted job. An agent reaches for the right tool based on what it actually needs. And that's the maturation pattern we're starting to see across agent platforms in 2026: extraction is graduating from "search bolted-on" to its own primitive.

For the platform engineer reading this

Dawn exposes FourA as eight named tools, each mapping to a common extraction pattern:

  • foura_fetch_page for HTML and text pages
  • foura_extract_text for clean readable content
  • foura_extract_links for navigation, forms, scripts, and styles
  • foura_fetch_json for API endpoints
  • foura_head_url for headers, status, redirects
  • foura_probe_site for fast reachability checks
  • foura_submit_form for login-free form submissions
  • foura_single_request for arbitrary HTTP

The agent picks based on what the question demands. The topstartups query above used three of them in sequence: a fetch, an extract, a follow-up.

The integration is straightforward enough to do in a day. Two request flavors sit underneath: a direct mode with browser-grade fingerprinting for sites that don't aggressively gate, and a proxy-routed mode for everything else. Both share the same request shape: URL, optional headers and body, optional response parsing. The agent picks based on what the target site demands.

The contract a platform offers its agents tends to look like:

  • A small set of capabilities (fetch / extract / probe / submit), each with a focused tool definition the agent can reach for
  • Default to proxy mode, fall back to direct when latency or cost matters
  • Per-agent permission so the platform's customers retain governance
  • Structured response parsing exposed as a tool param, not buried in a system prompt

But the part most platform engineers underestimate is what happens at the tail. The 80% case (a fetch succeeds in 200ms, returns clean HTML) is the easy half. The other 20% (sites that gate on TLS fingerprint, that race a JS challenge into the response, that 403 on a cloud IP block) is what determines whether your agent ships a correct answer or a hallucinated one. We rebuilt our request path for exactly that tail, and the difference between "feels reliable" and "actually reliable" is most of the work.

So if you run an agent platform and your customers keep asking how their agents could "just check this URL," that's the pattern. Docs are at /docs. We're happy to walk you through it.

For everyone else

You won't see any of this. You'll just notice that when you ask an AI assistant a question that requires looking at a real web page right now, it answers correctly instead of guessing or apologising.

That's the user-facing outcome of an extraction primitive reliable enough to live next to GitHub and Google Drive in an integrations grid. It stops being a research project. It starts being plumbing.

Why this matters

Six months ago, an agent that needed to read a webpage was a custom build. Bespoke prompts, brittle scrapers, hand-rolled retries, a 60% success rate on a good day. The shape was wrong because the layer didn't exist yet. And the sites the agent was hitting kept moving. Anti-bot tech shifted from static signals to behavioral checks, so the duct-taped scrapers degraded faster than teams could patch them.

Now the layer is forming. Dawn picked it up and shipped an integration. We expect more agent platforms to follow this year, and we expect the contract to converge: a dedicated tool for search, a dedicated tool for extraction, per-agent governance, predictable cost.

We're early. But this is what the rise of something looks like. When a capability stops being a project and starts being a plug.

If you build an agent platform and want to ship the same shape, say hello. If you build agents on Dawn, FourA is already there. Toggle it on.