All posts

Validate Rules Now Decide What Counts as Success

Declare which responses count as success using validate rules. Non-200 responses you accept now bill correctly and show as success in your Activity feed.

Your request's validate rules now drive how every outcome gets classified. Declare a 403 acceptable, and a delivered 403 counts as success, bills as success, and lands in your Activity feed alongside your 200s.

This sounds small. It changes how you measure scraping accuracy at scale.

How It Works

Every request to FourA gets one of seven outcomes that decide billing and analytics. Only success is billable. The rest split by who owns the failure:

  • application_fail and application_error for when the target site refused or returned an error body
  • client_error when the request you sent was malformed
  • service_fail, service_error, and rate_limit when something on our side blocked the request

Before this change, success meant exactly one thing: HTTP 200. A 403 was always application_fail, even if you knew that 403 was the response you wanted. (Some sports data APIs return 403 for geo-fenced markets, and that's the signal your code is waiting on.)

Now your validate block decides. The request runs your rules during execution. If the response satisfies them, the outcome is success.

curl -X POST "https://eu.api.foura.ai/v1/request" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/api/feed",
    "unblocker": true,
    "validate": {
      "status": { "accept": [200, 403] },
      "data": { "fail": ["captcha", "Access Denied"] }
    }
  }'

This treats 200 and 403 as valid status codes. If the body contains a CAPTCHA marker or an access-denied string, the request fails. Anything else is success.

Two rules to remember:

  1. Without validate, behavior is unchanged. Requests that don't declare validation still bill on HTTP 200 only. You opt in.
  2. validate runs both ways. Accept rules pass; fail rules reject. They compose. So you can accept [200, 403] and still fail when the body contains the wrong content.

Impact

The shift matters most for teams whose targets return non-200 responses they actually want.

Examples from requests we see daily:

  • Sports data APIs that return 403 for geo-fenced markets (still useful data, still worth logging as success)
  • E-commerce search endpoints that return 404 when a SKU is out of stock (a signal your code reads, not a failure)
  • Streaming and partial-content APIs that return 206

Before the change, those teams ran their own bookkeeping on top of our Activity logs. They couldn't trust the outcome column because their definition of success didn't match ours. They were billed against a number they didn't actually care about.

Now the column reflects reality. The Activity tab in your Dashboard shows what you defined as success, not what we guessed at. Your billed totals match what you'd count yourself (early results: the change rolled forward only, so older Activity rows keep their original classification).

The practical effect on a scraping job: fewer reconciliation steps between your pipeline and our invoice. If you were already running validation on the response body after the fact, you can move that contract into the request itself and stop maintaining a parallel set of pass/fail rules outside our API. One definition of whether a request earned its place in your dataset, instead of two that disagreed.

But we kept the safety net. If you don't pass a validate block, nothing changes. The classifier falls back to "200 means success" so requests that worked yesterday work the same way today.

For Power Users

validate accepts three rule sets that run independently: status, headers, and data. Each takes optional accept and fail lists.

curl -X POST "https://eu.api.foura.ai/v1/request" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/product/9876",
    "followRedirects": 5,
    "unblocker": true,
    "validate": {
      "status": { "accept": [200, 304] },
      "headers": { "accept": { "content-type": "application/json" } },
      "data": { "accept": ["\"price\":"], "fail": ["maintenance", "captcha"] }
    }
  }'

This requires:

  • Status is 200 or 304
  • Response advertises a JSON content type
  • Body contains a price field
  • Body does not contain a maintenance notice or a captcha trap

If any rule fails, the outcome is application_fail. If everything passes, it's success. The classifier runs inside the request itself, so you skip the round trip a separate validation step would cost.

Combined with followRedirects: follow up to five hops, then validate the final response. A bait-and-switch from a clean URL to a CAPTCHA gate fails cleanly instead of polluting your dataset.

And a tip from running our own scrapers: declare data.fail patterns aggressively. A 200 OK with a CAPTCHA inside it is the most common silent failure mode on protected sites. Treat the body as authoritative, not the status code.

For the full schema, the request reference lists every validate field and how each one composes.

What's Next

We're working on richer rule primitives: regex matchers for data, structured JSON-path predicates, and looser header matching. The principle stays the same. You declare what success looks like; the API honors it end to end, from the request through to your invoice.

When your scraper breaks, it should be loud about it. And when it works against rules you wrote yourself, that's a number you can actually trust.