← All posts

How a Price Intelligence Firm Tracks 10,000 SKUs Daily

Tracking 10,000 product prices across 200+ competitor sites daily is a serious infrastructure challenge. Here's how one pricing firm solved it cleanly.

Price intelligence is the backbone of competitive e-commerce. Companies that track competitor prices in real time can adjust their own pricing dynamically, protect margins, and capture market share. But building a system that reliably monitors 10,000 product pages every day is a serious engineering challenge.

This post walks through how a typical price intelligence operation works, the technical hurdles involved, and how data collection APIs like FourA simplify the infrastructure layer.

The Scale of the Problem

A mid-size price intelligence firm might track:

  • 10,000 SKUs across 50 competitor websites
  • 3 price checks per SKU per day (morning, afternoon, evening)
  • That's 30,000 page fetches daily, across sites with different layouts, protection systems, and rendering requirements

At this scale, you can't afford manual maintenance. Every broken selector, blocked IP, or site redesign costs hours of engineering time and gaps in your data.

Architecture

1. Product Catalog

The system starts with a structured catalog: SKU identifiers mapped to competitor URLs and CSS selectors for price elements.

{
  "sku": "LAPTOP-X1-16GB",
  "targets": [
    {"site": "competitor-a.com", "url": "https://competitor-a.com/laptop-x1", "selector": ".price-current", "type": "single"},
    {"site": "competitor-b.com", "url": "https://competitor-b.com/products/12345", "selector": "[data-price]", "type": "browser"},
    {"site": "competitor-c.com", "url": "https://competitor-c.com/item/laptop-x1", "selector": ".product-price", "type": "proxy"}
  ]
}

Notice the different task types per target. Each site has different characteristics.

2. Collection Pipeline

A scheduler dispatches collection jobs in batches. Each job calls the FourA API:

import requests
import time

def collect_price(target):
    resp = requests.post("https://eu.api.foura.ai/api/v1/tasks", headers={
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }, json={
        "url": target["url"],
        "type": target["type"],
        "options": {"waitFor": target["selector"]} if target["type"] == "browser" else {}
    })
    return resp.json()

The key insight: FourA handles proxy rotation, TLS fingerprinting, browser rendering, and retry logic. The collection pipeline only needs to send URLs and parse responses.

3. Price Extraction and Normalization

Raw HTML goes through a parser that extracts the price value, normalizes currency, and handles edge cases (sale prices, "from" ranges, out-of-stock indicators).

4. Change Detection and Alerts

Every new price is compared against the previous reading. Significant changes (typically a 2-5% threshold) trigger alerts to analysts or automated repricing systems.

Key Challenges

Site-specific complexity: Each competitor site has a unique layout, protection level, and rendering behavior. A one-size-fits-all approach fails quickly.

Data freshness: Stale prices are worse than no prices. The system must complete its daily collection within the time window, which means handling failures and retries efficiently.

Cost management: At 30,000 requests per day, infrastructure costs add up. Using the right task type for each target (single when possible, browser only when needed) reduces cost significantly.

Why APIs Beat DIY

A firm that built this in-house would need to maintain proxy pools, browser farms, and anti-detection code for every target site. That infrastructure overhead is the real cost. It's not the engineering time to write the initial scraper; it's the ongoing maintenance to keep it working.

Data collection APIs like FourA absorb that complexity. The firm focuses on what actually differentiates them (product catalog, pricing algorithms, customer relationships) instead of keeping Chrome up to date.


The firms pulling ahead in price intelligence aren't the ones with the biggest scraping teams. They're the ones who stopped building infrastructure and started building better pricing models. That's where the real competitive edge lives.

Learn more in the how-to guide and the API reference.