Cookies Và Sessions Trong Web Scraping

Trở lại Tin tức
Tin tức

Cookies Và Sessions Trong Web Scraping

Nhiều websites yêu cầu cookies/sessions để hoạt động. Bài viết hướng dẫn quản lý cookies khi scraping.

Tại Sao Cookies Quan Trọng?

  • Authentication: Giữ login state
  • Anti-bot: Track behavior
  • Personalization: Location, language
  • Cart/Wishlist: E-commerce sessions

Python Requests Sessions

import requests

# Tạo session - tự động quản lý cookies
session = requests.Session()

# Request đầu tiên - nhận cookies
session.get('https://example.com')

# Request tiếp theo - tự động gửi cookies
response = session.get('https://example.com/products')

# Xem cookies
print(session.cookies.get_dict())

Login Với Session

import requests

session = requests.Session()

# Get login page (CSRF token)
login_page = session.get('https://example.com/login')

# Extract CSRF token if needed
# csrf = extract_csrf(login_page.text)

# Login
login_data = {
    'username': 'user@email.com',
    'password': 'password123',
    # 'csrf_token': csrf
}
response = session.post('https://example.com/login', data=login_data)

# Giờ có thể access protected pages
profile = session.get('https://example.com/profile')
print(profile.text)

Set Cookies Thủ Công

import requests

session = requests.Session()

# Set specific cookies
session.cookies.set('session_id', 'abc123')
session.cookies.set('user_token', 'xyz789')

# Hoặc từ dict
cookies = {
    'session_id': 'abc123',
    'language': 'vi',
    'currency': 'VND'
}
response = session.get(url, cookies=cookies)

Export/Import Cookies

import requests
import pickle

session = requests.Session()
session.get('https://example.com')

# Save cookies
with open('cookies.pkl', 'wb') as f:
    pickle.dump(session.cookies, f)

# Load cookies later
with open('cookies.pkl', 'rb') as f:
    session.cookies.update(pickle.load(f))

Selenium Cookies

from selenium import webdriver
import json

driver = webdriver.Chrome()
driver.get('https://example.com')

# Get all cookies
cookies = driver.get_cookies()
print(cookies)

# Save cookies
with open('cookies.json', 'w') as f:
    json.dump(cookies, f)

# Load cookies
with open('cookies.json', 'r') as f:
    cookies = json.load(f)
    for cookie in cookies:
        driver.add_cookie(cookie)

Cookie Security Notes

  • HttpOnly: Không access từ JS
  • Secure: Chỉ gửi qua HTTPS
  • SameSite: CSRF protection
  • Expiry: Check cookie lifetime

Best Practices

  • Dùng Session object, không single requests
  • Save cookies để resume sessions
  • Respect cookie expiry
  • Rotate sessions khi bị block

VinaProxy + Session Management

  • Sticky sessions với same IP
  • Rotating sessions khi cần
  • Giá chỉ $0.5/GB

Dùng Thử Ngay →