Hướng Dẫn Cấu Hình Proxy Cho Web Scraping: Setup Hoàn Chỉnh

Trở lại Tin tức
Tin tức

Hướng Dẫn Cấu Hình Proxy Cho Web Scraping: Setup Hoàn Chỉnh

Bài viết hướng dẫn chi tiết cách setup proxy cho mọi tool và ngôn ngữ.

Proxy URL Format

# Chuẩn format
http://username:password@host:port

# Ví dụ VinaProxy
http://user123:pass456@proxy.vinaproxy.com:8080

# Với options (geo, session)
http://user123:pass456_country-vn@proxy.vinaproxy.com:8080

Python Requests

import requests

proxy = 'http://user:pass@proxy.vinaproxy.com:8080'

response = requests.get(
    'https://httpbin.org/ip',
    proxies={
        'http': proxy,
        'https': proxy
    }
)
print(response.json())  # Shows proxy IP

Python aiohttp (Async)

import aiohttp
import asyncio

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
            'https://httpbin.org/ip',
            proxy='http://user:pass@proxy.vinaproxy.com:8080'
        ) as response:
            return await response.json()

result = asyncio.run(fetch())

Playwright

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(
        proxy={
            'server': 'http://proxy.vinaproxy.com:8080',
            'username': 'user123',
            'password': 'pass456'
        }
    )
    page = browser.new_page()
    page.goto('https://httpbin.org/ip')
    print(page.content())
    browser.close()

Selenium

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument('--proxy-server=http://proxy.vinaproxy.com:8080')

driver = webdriver.Chrome(options=options)
driver.get('https://httpbin.org/ip')

Scrapy

# settings.py
DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,
}

# In spider
def start_requests(self):
    yield scrapy.Request(
        url,
        meta={'proxy': 'http://user:pass@proxy.vinaproxy.com:8080'}
    )

Node.js (axios)

const axios = require('axios');
const HttpsProxyAgent = require('https-proxy-agent');

const agent = new HttpsProxyAgent('http://user:pass@proxy.vinaproxy.com:8080');

axios.get('https://httpbin.org/ip', { httpsAgent: agent })
    .then(res => console.log(res.data));

cURL

# Command line
curl -x http://user:pass@proxy.vinaproxy.com:8080 https://httpbin.org/ip

# Với SOCKS5
curl --socks5 user:pass@proxy.vinaproxy.com:1080 https://httpbin.org/ip

Environment Variables

# .env file
PROXY_URL=http://user:pass@proxy.vinaproxy.com:8080

# Python
import os
from dotenv import load_dotenv

load_dotenv()
proxy = os.getenv('PROXY_URL')

# Hoặc system-wide
export HTTP_PROXY=http://user:pass@proxy.vinaproxy.com:8080
export HTTPS_PROXY=http://user:pass@proxy.vinaproxy.com:8080

Test Proxy Hoạt Động

import requests

def test_proxy(proxy_url):
    try:
        response = requests.get(
            'https://httpbin.org/ip',
            proxies={'http': proxy_url, 'https': proxy_url},
            timeout=10
        )
        print(f"✅ Working! IP: {response.json()['origin']}")
        return True
    except Exception as e:
        print(f"❌ Failed: {e}")
        return False

test_proxy('http://user:pass@proxy.vinaproxy.com:8080')

Xử Lý Lỗi

# Common errors
# 407 Proxy Authentication Required → Check username/password
# Connection refused → Check host/port
# Timeout → Proxy slow hoặc blocked

# Retry with different proxy
def request_with_fallback(url, proxies):
    for proxy in proxies:
        try:
            return requests.get(url, proxies={'http': proxy}, timeout=10)
        except:
            continue
    return None

VinaProxy Setup

  • Host: proxy.vinaproxy.com
  • Port: 8080 (HTTP) / 1080 (SOCKS5)
  • Auth: username:password từ dashboard
  • Giá: $0.5/GB

Lấy Credentials Ngay →