Từ Điển Web Scraping: 50+ Thuật Ngữ Cần Biết

Trở lại Tin tức

Tin tức

admin

February 5, 2026

Từ Điển Web Scraping: 50+ Thuật Ngữ Cần Biết

Tổng hợp các thuật ngữ web scraping quan trọng từ A-Z.

A

AJAX: Asynchronous JavaScript And XML – kỹ thuật load data không refresh page
Anti-bot: Hệ thống phát hiện và chặn bots/scrapers
API: Application Programming Interface – cách lấy data chính thức từ service
Async: Asynchronous – xử lý nhiều requests cùng lúc không đợi

B

BeautifulSoup: Python library parse HTML phổ biến
Bot: Chương trình tự động thực hiện tasks
Browser automation: Điều khiển browser bằng code
Bandwidth: Lượng data transfer, thường tính GB

C

CAPTCHA: Test phân biệt người và bot
Cloudflare: CDN/Security service hay block scrapers
Crawler: Bot đi theo links để index websites
CSS Selector: Cách chọn HTML elements (.class, #id)
Concurrent: Chạy nhiều requests song song

D

Datacenter proxy: Proxy từ data centers, nhanh nhưng dễ detect
DOM: Document Object Model – cấu trúc HTML tree
Dynamic content: Content render bằng JavaScript

E-F

ETL: Extract, Transform, Load – data pipeline
Exponential backoff: Tăng delay sau mỗi lần retry
Fingerprinting: Identify browser/device qua unique traits

G-H

Geo-targeting: Chọn location của proxy
GraphQL: Query language cho APIs
Headless browser: Browser không UI, chạy trong terminal
HTML: HyperText Markup Language – cấu trúc webpage
HTTP: Protocol truyền data web

I-J

IP address: Địa chỉ định danh trên internet
IP rotation: Đổi IP liên tục để tránh ban
JavaScript rendering: Chạy JS để get final content
JSON: JavaScript Object Notation – format data phổ biến

L-M

Lazy loading: Load content khi scroll đến
lxml: Fast HTML/XML parser cho Python
Middleware: Code xử lý giữa request và response
Mobile proxy: Proxy từ mobile carriers (3G/4G/5G)

P

Pagination: Chia content thành nhiều pages
Parser: Code đọc và extract data từ HTML
Playwright: Modern browser automation library
Proxy: Server trung gian che IP thật
Puppeteer: Node.js browser automation

R

Rate limiting: Giới hạn số requests/thời gian
Residential proxy: Proxy từ real ISP IPs
Retry: Thử lại khi request fail
robots.txt: File chỉ dẫn cho crawlers
Rotating proxy: Tự động đổi IP mỗi request

S

Scrapy: Python framework cho large-scale scraping
Selector: Pattern để tìm HTML elements
Selenium: Browser automation tool
Session: Duy trì cookies/state qua requests
Sitemap: XML file liệt kê URLs của site
SPA: Single Page Application – JS-heavy sites
Stealth: Kỹ thuật tránh bot detection
Sticky session: Giữ cùng IP trong thời gian nhất định

T-U

Throttling: Chậm requests để tránh block
Timeout: Thời gian chờ tối đa
User-Agent: Header identify browser/client

W-X

WebSocket: Protocol cho real-time communication
XPath: Language để navigate XML/HTML

VinaProxy Glossary Terms

Residential rotating: Đổi IP mỗi request
Pay-per-GB: Trả theo bandwidth dùng
$0.5/GB: Giá rẻ nhất thị trường!

Trải Nghiệm Ngay →

admin