Playwright Cho Web Scraping: Tốt Hơn Selenium?

Trở lại Tin tức
Tin tức

Playwright Cho Web Scraping: Tốt Hơn Selenium?

Playwright là framework browser automation mới từ Microsoft. So sánh với Selenium và hướng dẫn dùng cho web scraping.

Playwright vs Selenium

Feature Playwright Selenium
Speed Nhanh hơn Chậm hơn
Auto-wait Có (built-in) Manual
Multi-browser Chromium, Firefox, WebKit Cần driver riêng
Headless Default Cần config
API Modern async Legacy sync

Cài Đặt

# Python
pip install playwright
playwright install

# Node.js
npm install playwright

Python Example

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    
    page.goto("https://example.com")
    
    # Auto-wait cho selector
    title = page.locator("h1").text_content()
    print(title)
    
    # Screenshot
    page.screenshot(path="screenshot.png")
    
    browser.close()

Async Version

import asyncio
from playwright.async_api import async_playwright

async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()
        await page.goto("https://example.com")
        
        # Multiple elements
        links = await page.locator("a").all()
        for link in links:
            href = await link.get_attribute("href")
            print(href)
        
        await browser.close()

asyncio.run(main())

Playwright Với Proxy

browser = p.chromium.launch(
    headless=True,
    proxy={
        "server": "http://proxy.vinaproxy.com:8080",
        "username": "user",
        "password": "pass"
    }
)

Stealth Mode

# Cài playwright-stealth
pip install playwright-stealth

from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    stealth_sync(page)  # Apply stealth
    page.goto("https://protected-site.com")

Ưu Điểm Playwright

  • Auto-wait: Không cần explicit waits
  • Network interception: Block images, modify requests
  • Multiple contexts: Parallel browsing
  • Tracing: Debug với trace viewer
  • Video recording: Record sessions

Khi Nào Dùng Playwright?

  • JavaScript-heavy websites
  • SPA (React, Vue, Angular)
  • Sites cần login/interaction
  • Khi Selenium quá chậm

VinaProxy + Playwright

  • Bypass geo-restrictions
  • Rotate IPs cho large-scale scraping
  • Giá chỉ $0.5/GB

Dùng Thử Ngay →