Hướng Dẫn Cấu Hình Proxy Cho Web Scraping: Setup Hoàn Chỉnh
Bài viết hướng dẫn chi tiết cách setup proxy cho mọi tool và ngôn ngữ.
Proxy URL Format
# Chuẩn format
http://username:password@host:port
# Ví dụ VinaProxy
http://user123:pass456@proxy.vinaproxy.com:8080
# Với options (geo, session)
http://user123:pass456_country-vn@proxy.vinaproxy.com:8080
Python Requests
import requests
proxy = 'http://user:pass@proxy.vinaproxy.com:8080'
response = requests.get(
'https://httpbin.org/ip',
proxies={
'http': proxy,
'https': proxy
}
)
print(response.json()) # Shows proxy IP
Python aiohttp (Async)
import aiohttp
import asyncio
async def fetch():
async with aiohttp.ClientSession() as session:
async with session.get(
'https://httpbin.org/ip',
proxy='http://user:pass@proxy.vinaproxy.com:8080'
) as response:
return await response.json()
result = asyncio.run(fetch())
Playwright
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(
proxy={
'server': 'http://proxy.vinaproxy.com:8080',
'username': 'user123',
'password': 'pass456'
}
)
page = browser.new_page()
page.goto('https://httpbin.org/ip')
print(page.content())
browser.close()
Selenium
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument('--proxy-server=http://proxy.vinaproxy.com:8080')
driver = webdriver.Chrome(options=options)
driver.get('https://httpbin.org/ip')
Scrapy
# settings.py
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,
}
# In spider
def start_requests(self):
yield scrapy.Request(
url,
meta={'proxy': 'http://user:pass@proxy.vinaproxy.com:8080'}
)
Node.js (axios)
const axios = require('axios');
const HttpsProxyAgent = require('https-proxy-agent');
const agent = new HttpsProxyAgent('http://user:pass@proxy.vinaproxy.com:8080');
axios.get('https://httpbin.org/ip', { httpsAgent: agent })
.then(res => console.log(res.data));
cURL
# Command line
curl -x http://user:pass@proxy.vinaproxy.com:8080 https://httpbin.org/ip
# Với SOCKS5
curl --socks5 user:pass@proxy.vinaproxy.com:1080 https://httpbin.org/ip
Environment Variables
# .env file
PROXY_URL=http://user:pass@proxy.vinaproxy.com:8080
# Python
import os
from dotenv import load_dotenv
load_dotenv()
proxy = os.getenv('PROXY_URL')
# Hoặc system-wide
export HTTP_PROXY=http://user:pass@proxy.vinaproxy.com:8080
export HTTPS_PROXY=http://user:pass@proxy.vinaproxy.com:8080
Test Proxy Hoạt Động
import requests
def test_proxy(proxy_url):
try:
response = requests.get(
'https://httpbin.org/ip',
proxies={'http': proxy_url, 'https': proxy_url},
timeout=10
)
print(f"✅ Working! IP: {response.json()['origin']}")
return True
except Exception as e:
print(f"❌ Failed: {e}")
return False
test_proxy('http://user:pass@proxy.vinaproxy.com:8080')
Xử Lý Lỗi
# Common errors
# 407 Proxy Authentication Required → Check username/password
# Connection refused → Check host/port
# Timeout → Proxy slow hoặc blocked
# Retry with different proxy
def request_with_fallback(url, proxies):
for proxy in proxies:
try:
return requests.get(url, proxies={'http': proxy}, timeout=10)
except:
continue
return None
VinaProxy Setup
- Host: proxy.vinaproxy.com
- Port: 8080 (HTTP) / 1080 (SOCKS5)
- Auth: username:password từ dashboard
- Giá: $0.5/GB
