Scrape Giá Khách Sạn Và Vé Máy Bay: Travel Price Monitoring

Trở lại Tin tức
Tin tức

Scrape Giá Khách Sạn Và Vé Máy Bay: Travel Price Monitoring

Giá travel thay đổi liên tục. Bài viết hướng dẫn scrape giá khách sạn và vé máy bay để tìm deals tốt.

Use Cases

  • Price alerts: Thông báo khi giá giảm
  • Price comparison: So sánh nhiều sources
  • Market analysis: Xu hướng giá theo mùa
  • Booking optimization: Tìm thời điểm tốt nhất

Các Trang Travel

  • booking.com
  • agoda.com
  • traveloka.com
  • vntrip.vn
  • skyscanner.com

Thách Thức

  • Anti-bot mạnh: Travel sites rất strict
  • Dynamic pricing: Giá thay đổi theo session
  • JavaScript heavy: Cần headless browser
  • Geo-based pricing: Giá khác theo location

Hotel Scraper

from playwright.sync_api import sync_playwright
import json

def scrape_hotels(destination, checkin, checkout):
    with sync_playwright() as p:
        browser = p.chromium.launch(
            headless=True,
            proxy={"server": "http://proxy.vinaproxy.com:8080"}
        )
        page = browser.new_page()
        
        url = f'https://www.booking.com/searchresults.html?ss={destination}&checkin={checkin}&checkout={checkout}'
        page.goto(url)
        page.wait_for_selector('[data-testid="property-card"]')
        
        hotels = []
        for card in page.query_selector_all('[data-testid="property-card"]'):
            hotels.append({
                'name': card.query_selector('[data-testid="title"]').inner_text(),
                'price': card.query_selector('[data-testid="price-and-discounted-price"]').inner_text(),
                'rating': card.query_selector('[data-testid="review-score"]').inner_text() if card.query_selector('[data-testid="review-score"]') else None,
                'location': card.query_selector('[data-testid="address"]').inner_text()
            })
        
        browser.close()
        return hotels

# Search hotels in Da Nang
hotels = scrape_hotels('Da Nang', '2026-03-01', '2026-03-03')

Flight Price Tracker

def scrape_flights(origin, dest, date):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()
        
        url = f'https://www.skyscanner.com/transport/flights/{origin}/{dest}/{date}/'
        page.goto(url)
        page.wait_for_timeout(5000)  # Wait for prices to load
        
        flights = []
        for result in page.query_selector_all('.FlightsResults_dayViewItem'):
            flights.append({
                'airline': result.query_selector('.LogoImage_container').get_attribute('alt'),
                'price': result.query_selector('.Price_mainPriceContainer').inner_text(),
                'duration': result.query_selector('.Duration_duration').inner_text(),
                'stops': result.query_selector('.LegInfo_stopsLabelContainer').inner_text()
            })
        
        browser.close()
        return flights

Price Tracking Over Time

import sqlite3
from datetime import datetime

def save_prices(hotels, search_date):
    conn = sqlite3.connect('travel_prices.db')
    cursor = conn.cursor()
    
    for hotel in hotels:
        cursor.execute('''
            INSERT INTO hotel_prices 
            (name, price, rating, scraped_at, search_date)
            VALUES (?, ?, ?, ?, ?)
        ''', (hotel['name'], hotel['price'], hotel['rating'],
              datetime.now().isoformat(), search_date))
    
    conn.commit()
    conn.close()

# Track prices daily
# Cron job: 0 6 * * * python track_prices.py

Best Practices

  • Dùng residential proxy (bắt buộc)
  • Rotate browser fingerprints
  • Vary search parameters
  • Scrape off-peak hours

VinaProxy + Travel Scraping

  • Residential IPs bypass anti-bot
  • Geo-targeting cho local prices
  • Giá chỉ $0.5/GB

Dùng Thử Ngay →