Scrape a Dynamic Website
Dynamic websites load content using JavaScript after the initial page load. This guide shows how to collect data from these sites using FourA's browser endpoint.
The Problem
When you send a standard HTTP request to a JavaScript-heavy website, you get the HTML shell but not the actual content. The data you need (product listings, prices, search results) is loaded by JavaScript after the page renders in a browser.
This is increasingly common with modern frameworks like React, Vue, Angular, and Next.js.
The Solution: Browser Requests
FourA's browser endpoint (POST /api/browser/) opens your URL in a real Chrome browser that:
- Loads the page
- Executes all JavaScript
- Waits for the content to render
- Returns the fully rendered HTML
Step 1: Identify What You Need
Before making the request, visit the target page in your browser and use DevTools (F12) to find a piece of text or element that confirms the content has loaded. For example:
- A product name that appears after JS renders
- A CSS class like
product-gridin the rendered HTML - A text string like "results" that only appears when data loads
Step 2: Send a Browser Request
curl -X POST https://eu.api.foura.ai/api/browser/ \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/products",
"timeout_ms": 15000,
"checkText": "product-grid"
}'
The checkText option tells FourA to verify that the string "product-grid" appears in the rendered page. If it doesn't appear before the timeout, the request fails, letting you know the content didn't load.
Step 3: Parse the HTML
The response contains the fully rendered HTML in the body field. Parse it with your preferred library:
Python (BeautifulSoup)
import requests
from bs4 import BeautifulSoup
resp = requests.post("https://eu.api.foura.ai/api/browser/", headers={
"X-API-Key": "YOUR_API_KEY",
"Content-Type": "application/json"
}, json={
"url": "https://example.com/products",
"timeout_ms": 15000,
"checkText": "product-grid"
})
html = resp.json()["body"]
soup = BeautifulSoup(html, "html.parser")
for product in soup.select(".product-card"):
name = product.select_one(".product-name").text.strip()
price = product.select_one(".product-price").text.strip()
print(f"{name}: {price}")
Node.js (cheerio)
import * as cheerio from 'cheerio';
const resp = await fetch('https://eu.api.foura.ai/api/browser/', {
method: 'POST',
headers: { 'X-API-Key': 'YOUR_API_KEY', 'Content-Type': 'application/json' },
body: JSON.stringify({
url: 'https://example.com/products',
timeout_ms: 15000,
checkText: 'product-grid'
})
});
const { body: html } = await resp.json();
const $ = cheerio.load(html);
$('.product-card').each((i, el) => {
console.log($(el).find('.product-name').text(), $(el).find('.product-price').text());
});
Troubleshooting
Still getting empty content?
- Verify the page actually uses JavaScript rendering (check with "View Source" vs. DevTools)
- Increase
timeout_ms: some pages load slowly - Check if the page requires authentication or cookies (use the
cookiesparameter)
Getting a captcha page?
- For single/HTTP requests, switch to the proxy endpoint (
POST /api/proxy/) for automatic IP rotation. - To add proxy rotation to browser requests, use the browser endpoint's
proxyparameter instead of wrapping in the proxy endpoint. The proxy endpoint only wraps single/HTTP requests, not browser requests.
curl -X POST "https://eu.api.foura.ai/api/browser/" \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "proxy": "http://proxy-url:port"}'
Next Steps
- Choosing the Right Endpoint: When to use browser vs. single
- Monitor Competitor Prices: Full price tracking tutorial
- Anti-Bot Protection: Handle protected sites