Scrape Lazada Với Python: Hướng Dẫn Từng Bước
Lazada là sàn TMĐT thuộc Alibaba, hoạt động mạnh tại Việt Nam. Đây là guide scrape dữ liệu Lazada.
Use Cases
- Price monitoring cho sellers
- Competitor analysis
- Market research
- Product catalog building
Thách Thức
- Cloudflare protection
- JavaScript rendering required
- API rate limits
Setup Cơ Bản
pip install selenium requests beautifulsoup4
Scrape Với Selenium
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
import time
# Setup headless Chrome
opts = Options()
opts.add_argument("--headless")
opts.add_argument("--no-sandbox")
driver = webdriver.Chrome(options=opts)
driver.get("https://www.lazada.vn/catalog/?q=laptop")
time.sleep(5) # Wait for JS
products = driver.find_elements(By.CSS_SELECTOR, "[data-qa-locator='product-item']")
for p in products[:10]:
try:
title = p.find_element(By.CSS_SELECTOR, ".title").text
price = p.find_element(By.CSS_SELECTOR, ".price").text
print(f"{title}: {price}")
except:
pass
driver.quit()
Xử Lý Cloudflare
import undetected_chromedriver as uc
driver = uc.Chrome()
driver.get("https://www.lazada.vn")
# Cloudflare bypassed automatically
Dùng Proxy
opts = Options()
opts.add_argument("--proxy-server=http://user:pass@proxy.vinaproxy.com:8080")
driver = webdriver.Chrome(options=opts)
Data Points
- Product name, description
- Original price, sale price
- Discount percentage
- Seller information
- Reviews count, rating
- Stock status
VinaProxy Cho Lazada Scraping
- Bypass geo-restrictions
- Avoid IP bans
- Residential IPs trusted
- Giá chỉ $0.5/GB
