Scrape Giá Khách Sạn Và Vé Máy Bay: Travel Price Monitoring
Giá travel thay đổi liên tục. Bài viết hướng dẫn scrape giá khách sạn và vé máy bay để tìm deals tốt.
Use Cases
- Price alerts: Thông báo khi giá giảm
- Price comparison: So sánh nhiều sources
- Market analysis: Xu hướng giá theo mùa
- Booking optimization: Tìm thời điểm tốt nhất
Các Trang Travel
- booking.com
- agoda.com
- traveloka.com
- vntrip.vn
- skyscanner.com
Thách Thức
- Anti-bot mạnh: Travel sites rất strict
- Dynamic pricing: Giá thay đổi theo session
- JavaScript heavy: Cần headless browser
- Geo-based pricing: Giá khác theo location
Hotel Scraper
from playwright.sync_api import sync_playwright
import json
def scrape_hotels(destination, checkin, checkout):
with sync_playwright() as p:
browser = p.chromium.launch(
headless=True,
proxy={"server": "http://proxy.vinaproxy.com:8080"}
)
page = browser.new_page()
url = f'https://www.booking.com/searchresults.html?ss={destination}&checkin={checkin}&checkout={checkout}'
page.goto(url)
page.wait_for_selector('[data-testid="property-card"]')
hotels = []
for card in page.query_selector_all('[data-testid="property-card"]'):
hotels.append({
'name': card.query_selector('[data-testid="title"]').inner_text(),
'price': card.query_selector('[data-testid="price-and-discounted-price"]').inner_text(),
'rating': card.query_selector('[data-testid="review-score"]').inner_text() if card.query_selector('[data-testid="review-score"]') else None,
'location': card.query_selector('[data-testid="address"]').inner_text()
})
browser.close()
return hotels
# Search hotels in Da Nang
hotels = scrape_hotels('Da Nang', '2026-03-01', '2026-03-03')
Flight Price Tracker
def scrape_flights(origin, dest, date):
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
url = f'https://www.skyscanner.com/transport/flights/{origin}/{dest}/{date}/'
page.goto(url)
page.wait_for_timeout(5000) # Wait for prices to load
flights = []
for result in page.query_selector_all('.FlightsResults_dayViewItem'):
flights.append({
'airline': result.query_selector('.LogoImage_container').get_attribute('alt'),
'price': result.query_selector('.Price_mainPriceContainer').inner_text(),
'duration': result.query_selector('.Duration_duration').inner_text(),
'stops': result.query_selector('.LegInfo_stopsLabelContainer').inner_text()
})
browser.close()
return flights
Price Tracking Over Time
import sqlite3
from datetime import datetime
def save_prices(hotels, search_date):
conn = sqlite3.connect('travel_prices.db')
cursor = conn.cursor()
for hotel in hotels:
cursor.execute('''
INSERT INTO hotel_prices
(name, price, rating, scraped_at, search_date)
VALUES (?, ?, ?, ?, ?)
''', (hotel['name'], hotel['price'], hotel['rating'],
datetime.now().isoformat(), search_date))
conn.commit()
conn.close()
# Track prices daily
# Cron job: 0 6 * * * python track_prices.py
Best Practices
- Dùng residential proxy (bắt buộc)
- Rotate browser fingerprints
- Vary search parameters
- Scrape off-peak hours
VinaProxy + Travel Scraping
- Residential IPs bypass anti-bot
- Geo-targeting cho local prices
- Giá chỉ $0.5/GB
