Theo dõi giá đối thủ

Thiết lập hệ thống tự động theo dõi giá trên các trang web của đối thủ cạnh tranh bằng FourA API.

Sản phẩm bạn sẽ xây dựng

Một tập lệnh Python thực hiện:

Lấy trang sản phẩm từ danh sách URL của đối thủ
Trích xuất dữ liệu giá từ HTML
Ghi kết quả vào tệp CSV
Chạy theo lịch trình

Điều kiện tiên quyết

Một API key của FourA (nhận tại đây)
Python 3.8+
Các gói requests và beautifulsoup4

pip install requests beautifulsoup4

Bước 1: Xác định mục tiêu

Tạo danh sách các URL sản phẩm cần theo dõi:

targets = [
    {"name": "Competitor A - Widget", "url": "https://competitor-a.com/widget", "selector": ".price"},
    {"name": "Competitor B - Widget", "url": "https://competitor-b.com/products/widget", "selector": "[data-price]"},
    {"name": "Competitor C - Widget", "url": "https://competitor-c.com/item/123", "selector": ".product-price span"},
]

Bước 2: Lấy trang qua FourA

import requests
import time

SINGLE_URL = "https://eu.api.foura.ai/api/single/"
BROWSER_URL = "https://eu.api.foura.ai/api/browser/"
API_KEY = "YOUR_API_KEY"

HEADERS = {
    "X-API-Key": API_KEY,
    "Content-Type": "application/json"
}

def fetch_page(url, use_browser=False):
    if use_browser:
        resp = requests.post(BROWSER_URL, headers=HEADERS, json={
            "url": url,
            "timeout_ms": 15000
        })
        if resp.status_code == 429:
            time.sleep(5)
            return fetch_page(url, use_browser)
        return resp.json().get("body", "")
    else:
        resp = requests.post(SINGLE_URL, headers=HEADERS, json={
            "method": "GET",
            "url": url,
            "unblocker": True
        })
        if resp.status_code == 429:
            time.sleep(5)
            return fetch_page(url, use_browser)
        return resp.json().get("data", "")

Bước 3: Trích xuất giá

from bs4 import BeautifulSoup
import re

def extract_price(html, selector):
    soup = BeautifulSoup(html, "html.parser")
    element = soup.select_one(selector)
    if not element:
        return None
    # Extract numeric price from text like "$49.99" or "49,99 EUR"
    text = element.get_text(strip=True)
    match = re.search(r'[\d,.]+', text)
    return float(match.group().replace(',', '.')) if match else None

Bước 4: Chạy và ghi kết quả

import csv
from datetime import datetime

def monitor_prices():
    timestamp = datetime.now().isoformat()
    results = []

    for target in targets:
        html = fetch_page(target["url"])
        price = extract_price(html, target["selector"])
        results.append({
            "timestamp": timestamp,
            "name": target["name"],
            "url": target["url"],
            "price": price
        })
        print(f"{target['name']}: {price}")
        time.sleep(1)  # Be polite

    # Append to CSV
    with open("prices.csv", "a", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=["timestamp", "name", "url", "price"])
        if f.tell() == 0:
            writer.writeheader()
        writer.writerows(results)

if __name__ == "__main__":
    monitor_prices()

Bước 5: Lên lịch chạy

Chạy tập lệnh hàng giờ bằng cron:

crontab -e
# Add this line:
0 * * * * cd /path/to/project && python3 monitor.py >> monitor.log 2>&1

Mẹo

Bắt đầu với single endpoint, chuyển sang browser nếu trang sử dụng kết xuất JavaScript
Thêm xử lý lỗi: các trang web có thể thay đổi bố cục. Hãy ghi nhật ký lỗi riêng biệt.
Luôn cập nhật các selector: khi đối thủ thiết kế lại giao diện, hãy cập nhật CSS selector
Tôn trọng các trang web: giãn cách các request, tránh giờ cao điểm, tuân thủ robots.txt

Bước tiếp theo

Choosing the Right Endpoint: Chọn phương pháp tốt nhất
Error Handling: Xử lý lỗi một cách mượt mà
Scrape a Dynamic Website: Xử lý các trang được kết xuất bằng JavaScript

Cập nhật: 27 tháng 4, 2026