Price Monitoring Với Web Scraping: Theo Dõi Giá Đối Thủ
Price monitoring là use case phổ biến nhất của web scraping. Bài viết hướng dẫn xây dựng hệ thống theo dõi giá.
Use Cases
- E-commerce: Theo dõi giá đối thủ
- Dropshipping: Monitor supplier prices
- Consumers: Price drop alerts
- Market research: Price trends analysis
Architecture
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Scraper │────▶│ Database │────▶│ Dashboard │
└─────────────┘ └─────────────┘ └─────────────┘
│ │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ Proxy │ │ Alerts │
└─────────────┘ └─────────────┘
Scraper Code
import requests
from bs4 import BeautifulSoup
import sqlite3
from datetime import datetime
def scrape_product(url):
response = requests.get(url, headers={'User-Agent': '...'})
soup = BeautifulSoup(response.text, 'lxml')
return {
'url': url,
'name': soup.select_one('.product-name').text.strip(),
'price': float(soup.select_one('.price').text.replace('$', '')),
'timestamp': datetime.now().isoformat()
}
def save_price(product):
conn = sqlite3.connect('prices.db')
cursor = conn.cursor()
cursor.execute('''
INSERT INTO prices (url, name, price, timestamp)
VALUES (?, ?, ?, ?)
''', (product['url'], product['name'],
product['price'], product['timestamp']))
conn.commit()
conn.close()
# Monitor multiple products
urls = [
'https://shop.com/product-1',
'https://shop.com/product-2',
]
for url in urls:
product = scrape_product(url)
save_price(product)
print(f"{product['name']}: ${product['price']}")
Database Schema
CREATE TABLE products (
id INTEGER PRIMARY KEY,
url TEXT UNIQUE,
name TEXT
);
CREATE TABLE prices (
id INTEGER PRIMARY KEY,
product_id INTEGER,
price REAL,
timestamp DATETIME,
FOREIGN KEY (product_id) REFERENCES products(id)
);
CREATE INDEX idx_prices_timestamp ON prices(timestamp);
Price Change Detection
def check_price_change(url, new_price):
conn = sqlite3.connect('prices.db')
cursor = conn.cursor()
cursor.execute('''
SELECT price FROM prices
WHERE url = ?
ORDER BY timestamp DESC LIMIT 1
''', (url,))
row = cursor.fetchone()
if row:
old_price = row[0]
if new_price != old_price:
change = ((new_price - old_price) / old_price) * 100
return {
'old': old_price,
'new': new_price,
'change_pct': change
}
return None
Alert System
import smtplib
def send_alert(product, price_change):
if price_change['change_pct'] < -10: # 10% drop
message = f"""
Price Drop Alert!
Product: {product['name']}
Old: ${price_change['old']}
New: ${price_change['new']}
Change: {price_change['change_pct']:.1f}%
"""
# Send email/Telegram/Slack notification
print(message)
Best Practices
- Scrape at consistent intervals
- Store historical data cho trends
- Handle price not found gracefully
- Detect website layout changes
VinaProxy + Price Monitoring
- Monitor đối thủ 24/7 không bị block
- Residential IPs cho e-commerce sites
- Giá chỉ $0.5/GB
