SEO Monitoring Với Web Scraping: Theo Dõi Thứ Hạng Keywords
Theo dõi thứ hạng keywords là việc cần làm thường xuyên. Bài viết hướng dẫn xây dựng rank tracker với scraping.
Tại Sao Cần Rank Tracking?
- Đo lường hiệu quả SEO
- Phát hiện sớm drops
- Monitor đối thủ
- Track multiple keywords
Cảnh Báo Quan Trọng
⚠️ Google có thể block nếu scrape quá nhiều. Dùng proxy và delays!
Basic Rank Checker
import requests
from bs4 import BeautifulSoup
def check_rank(keyword, domain, max_pages=5):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)...'
}
for page in range(max_pages):
start = page * 10
url = f'https://www.google.com/search?q={keyword}&start={start}'
response = requests.get(url, headers=headers,
proxies={'http': 'http://proxy.vinaproxy.com:8080'})
soup = BeautifulSoup(response.text, 'lxml')
results = soup.select('div.g')
for i, result in enumerate(results):
link = result.select_one('a')
if link and domain in link.get('href', ''):
return (page * 10) + i + 1
time.sleep(random.uniform(2, 5))
return None # Not in top results
# Usage
rank = check_rank('proxy việt nam', 'vinaproxy.com')
print(f"Ranking: #{rank}" if rank else "Not found in top 50")
Track Multiple Keywords
import csv
from datetime import datetime
keywords = [
'proxy việt nam',
'residential proxy',
'web scraping python',
]
results = []
for kw in keywords:
rank = check_rank(kw, 'vinaproxy.com')
results.append({
'keyword': kw,
'rank': rank,
'date': datetime.now().strftime('%Y-%m-%d')
})
print(f"{kw}: #{rank}")
time.sleep(5) # Important delay!
# Save to CSV
with open('rankings.csv', 'a', newline='') as f:
writer = csv.DictWriter(f, fieldnames=['date', 'keyword', 'rank'])
writer.writerows(results)
SERP Feature Detection
def analyze_serp(keyword):
# Scrape SERP
response = requests.get(f'https://google.com/search?q={keyword}',
headers=headers, proxies=proxies)
soup = BeautifulSoup(response.text, 'lxml')
features = {
'featured_snippet': soup.select_one('.kp-blk') is not None,
'people_also_ask': soup.select_one('.related-question-pair') is not None,
'local_pack': soup.select_one('.VkpGBb') is not None,
'images': soup.select_one('.rg_meta') is not None,
'videos': soup.select_one('.RzdJxc') is not None
}
return features
Competitor Tracking
competitors = ['competitor1.com', 'competitor2.com']
def compare_rankings(keyword, domains):
rankings = {}
for domain in domains:
rank = check_rank(keyword, domain)
rankings[domain] = rank
time.sleep(3)
return rankings
# Compare
comparison = compare_rankings('proxy việt nam',
['vinaproxy.com'] + competitors)
print(comparison)
Best Practices
- Track weekly, không daily
- Dùng residential proxy
- Long delays (5-10s) giữa requests
- Rotate User-Agents
- Store historical data
VinaProxy + SEO Monitoring
- Residential IPs cho Google scraping
- Geo-targeted rankings
- Giá chỉ $0.5/GB
