Cách Scrape Dữ Liệu Sản Phẩm Amazon Với Python (2026)

Amazon là kho dữ liệu khổng lồ cho price monitoring và market research. Bài viết này hướng dẫn cách scrape Amazon product data với Python.

Tại Sao Scrape Amazon?

Theo dõi giá đối thủ
Phân tích reviews
Thu thập data cho AI/ML
Market research

Thách Thức Khi Scrape Amazon

Anti-bot protection mạnh
CAPTCHA thường xuyên
Rate limiting
HTML structure thay đổi liên tục

ASIN Là Gì?

ASIN = Amazon Standard Identification Number. Mỗi sản phẩm có 1 ASIN unique (10 ký tự).

Lấy ASIN Từ URL

import re

def extract_asin(url):
    match = re.search(r'/dp/([A-Z0-9]{10})', url)
    return match.group(1) if match else None

# Ví dụ
url = "https://www.amazon.com/dp/B0BNWCWG7L"
print(extract_asin(url))  # B0BNWCWG7L

Scrape Amazon Với Proxy

Setup Môi Trường

pip install requests beautifulsoup4

Code Scraper Cơ Bản

import requests
from bs4 import BeautifulSoup

# Cấu hình proxy VinaProxy
proxies = {
    'http': 'http://user:pass@proxy.vinaproxy.com:8080',
    'https': 'http://user:pass@proxy.vinaproxy.com:8080'
}

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
}

def scrape_amazon_product(asin):
    url = f'https://www.amazon.com/dp/{asin}'
    
    response = requests.get(
        url, 
        headers=headers,
        proxies=proxies,
        timeout=30
    )
    
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # Extract data
    title = soup.find('span', id='productTitle')
    price = soup.find('span', class_='a-price-whole')
    rating = soup.find('span', class_='a-icon-alt')
    
    return {
        'asin': asin,
        'title': title.text.strip() if title else None,
        'price': price.text if price else None,
        'rating': rating.text if rating else None
    }

# Sử dụng
data = scrape_amazon_product('B0BNWCWG7L')
print(data)

Best Practices

Rotate User-Agent: Thay đổi mỗi request
Random delays: 2-5 giây giữa requests
Residential proxy: Tránh bị detect
Retry logic: Handle failures gracefully

VinaProxy Cho Amazon Scraping

Residential IP bypass anti-bot
Auto-rotation
Giá chỉ $0.5/GB
Hỗ trợ kỹ thuật tiếng Việt

Xem Use Case eCommerce →