Xử Lý Timeout Trong Python Requests: Hướng Dẫn Chi Tiết

Script scraping của bạn bị treo không phản hồi? Có thể bạn đã gặp vấn đề timeout trong Python Requests. Bài viết này hướng dẫn cách xử lý đúng.

Timeout Là Gì?

Timeout là giới hạn thời gian chờ response từ server. Không có timeout = script có thể treo vĩnh viễn.

Lỗi Phổ Biến: Không Set Timeout

Mặc định, requests.get(url) không có timeout. Nếu server không phản hồi, script sẽ chờ mãi.

Cách Set Timeout Đúng

1. Timeout Đơn Giản

import requests

# Timeout 5 giây cho cả connect và read
response = requests.get('https://example.com', timeout=5)

2. Tách Connect và Read Timeout

# (connect_timeout, read_timeout)
response = requests.get('https://example.com', timeout=(3, 10))
# 3 giây để connect, 10 giây để đọc data

3. Xử Lý Exception

import requests
from requests.exceptions import Timeout, ConnectionError

try:
    response = requests.get(url, timeout=5)
except Timeout:
    print("Request timed out!")
except ConnectionError:
    print("Connection failed!")

Giá Trị Timeout Khuyến Nghị

2-3 giây: API nội bộ, tốc độ cao
5 giây: Mặc định cân bằng
10-15 giây: Server chậm, xử lý nặng

Retry Thông Minh

import time
import random

def request_with_retry(url, max_retries=3):
    for i in range(max_retries):
        try:
            return requests.get(url, timeout=5)
        except Timeout:
            wait = (2 ** i) + random.uniform(0, 1)
            time.sleep(wait)
    raise Exception("Max retries exceeded")

Kết Hợp Với Proxy

proxies = {
    'http': 'http://user:pass@proxy.vinaproxy.com:8080',
    'https': 'http://user:pass@proxy.vinaproxy.com:8080'
}

response = requests.get(
    url, 
    proxies=proxies, 
    timeout=(5, 30)  # Timeout dài hơn khi dùng proxy
)

VinaProxy – Proxy Ổn Định Cho Scraping

Uptime cao, ít timeout
Residential IP Việt Nam
Giá chỉ $0.5/GB

Dùng Thử Ngay →