监控竞品价格
使用 FourA API 在竞争对手网站上设置自动化价格监控。
你将构建什么
一个 Python 脚本,用于:
- 从竞争对手 URL 列表中获取商品页面
- 从 HTML 中提取价格数据
- 将结果记录到 CSV 文件中
- 按计划运行
前提条件
- 一个 FourA API key(在此处获取)
- Python 3.8+
requests和beautifulsoup4包
pip install requests beautifulsoup4
步骤 1:定义目标
创建要监控的商品 URL 列表:
targets = [
{"name": "Competitor A - Widget", "url": "https://competitor-a.com/widget", "selector": ".price"},
{"name": "Competitor B - Widget", "url": "https://competitor-b.com/products/widget", "selector": "[data-price]"},
{"name": "Competitor C - Widget", "url": "https://competitor-c.com/item/123", "selector": ".product-price span"},
]
步骤 2:通过 FourA 获取页面
import requests
import time
SINGLE_URL = "https://eu.api.foura.ai/api/single/"
BROWSER_URL = "https://eu.api.foura.ai/api/browser/"
API_KEY = "YOUR_API_KEY"
HEADERS = {
"X-API-Key": API_KEY,
"Content-Type": "application/json"
}
def fetch_page(url, use_browser=False):
if use_browser:
resp = requests.post(BROWSER_URL, headers=HEADERS, json={
"url": url,
"timeout_ms": 15000
})
if resp.status_code == 429:
time.sleep(5)
return fetch_page(url, use_browser)
return resp.json().get("body", "")
else:
resp = requests.post(SINGLE_URL, headers=HEADERS, json={
"method": "GET",
"url": url,
"unblocker": True
})
if resp.status_code == 429:
time.sleep(5)
return fetch_page(url, use_browser)
return resp.json().get("data", "")
步骤 3:提取价格
from bs4 import BeautifulSoup
import re
def extract_price(html, selector):
soup = BeautifulSoup(html, "html.parser")
element = soup.select_one(selector)
if not element:
return None
# Extract numeric price from text like "$49.99" or "49,99 EUR"
text = element.get_text(strip=True)
match = re.search(r'[\d,.]+', text)
return float(match.group().replace(',', '.')) if match else None
步骤 4:运行并记录结果
import csv
from datetime import datetime
def monitor_prices():
timestamp = datetime.now().isoformat()
results = []
for target in targets:
html = fetch_page(target["url"])
price = extract_price(html, target["selector"])
results.append({
"timestamp": timestamp,
"name": target["name"],
"url": target["url"],
"price": price
})
print(f"{target['name']}: {price}")
time.sleep(1) # Be polite
# Append to CSV
with open("prices.csv", "a", newline="") as f:
writer = csv.DictWriter(f, fieldnames=["timestamp", "name", "url", "price"])
if f.tell() == 0:
writer.writeheader()
writer.writerows(results)
if __name__ == "__main__":
monitor_prices()
步骤 5:设置定时任务
使用 cron 每小时运行一次脚本:
crontab -e
# Add this line:
0 * * * * cd /path/to/project && python3 monitor.py >> monitor.log 2>&1
提示
- 从 single endpoint 开始,如果页面使用 JavaScript 渲染,则切换到 browser
- 添加错误处理:网站会更改布局。单独记录失败信息。
- 保持选择器更新:当竞争对手重新设计网站时,更新 CSS 选择器
- 尊重目标网站:拉开请求间隔,避免高峰时段,遵守 robots.txt
后续步骤
- 选择正确的 Endpoint:选择最佳方案
- 错误处理:优雅地处理失败
- 抓取动态网站:处理 JavaScript 渲染的页面