Geo-Targeting Với Proxy: Scrape Như User Địa Phương
Nhiều websites hiển thị content khác nhau theo location. Bài viết hướng dẫn geo-targeting với proxy để access localized content.
Tại Sao Cần Geo-Targeting?
- Price localization: Giá khác theo quốc gia
- Content restrictions: Geo-blocked content
- Search results: Local SEO rankings
- Ad verification: Xem ads theo region
- Competitive research: Markets khác nhau
Cách Websites Detect Location
- IP Address: Primary method
- Accept-Language header: Browser language
- Timezone: JavaScript timezone
- GPS: Mobile devices
Geo-Targeted Proxy Usage
import requests
# Proxy với specific country
proxies = {
'http': 'http://user:pass_country-us@proxy.vinaproxy.com:8080',
'https': 'http://user:pass_country-us@proxy.vinaproxy.com:8080'
}
# Or city-level targeting
proxies = {
'http': 'http://user:pass_country-us_city-newyork@proxy.vinaproxy.com:8080'
}
response = requests.get('https://example.com', proxies=proxies)
Match Headers Với Location
def get_localized_headers(country):
locale_map = {
'us': {'lang': 'en-US', 'tz': 'America/New_York'},
'vn': {'lang': 'vi-VN', 'tz': 'Asia/Ho_Chi_Minh'},
'jp': {'lang': 'ja-JP', 'tz': 'Asia/Tokyo'},
'de': {'lang': 'de-DE', 'tz': 'Europe/Berlin'},
}
locale = locale_map.get(country, locale_map['us'])
return {
'Accept-Language': f"{locale['lang']},en;q=0.9",
'User-Agent': 'Mozilla/5.0...',
}
headers = get_localized_headers('vn')
response = requests.get(url, headers=headers, proxies=proxies)
Playwright Với Geo Settings
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(
proxy={
'server': 'http://proxy.vinaproxy.com:8080',
'username': 'user_country-vn',
'password': 'pass'
}
)
context = browser.new_context(
locale='vi-VN',
timezone_id='Asia/Ho_Chi_Minh',
geolocation={'latitude': 10.8231, 'longitude': 106.6297},
permissions=['geolocation']
)
page = context.new_page()
page.goto('https://www.google.com')
# Will show Vietnamese Google
Price Comparison Across Countries
countries = ['us', 'uk', 'de', 'jp', 'vn']
def get_price(product_url, country):
proxies = {
'http': f'http://user:pass_country-{country}@proxy.vinaproxy.com:8080'
}
headers = get_localized_headers(country)
response = requests.get(product_url, proxies=proxies, headers=headers)
# Parse price from response
return parse_price(response.text)
# Compare prices
for country in countries:
price = get_price('https://store.com/product', country)
print(f"{country.upper()}: {price}")
Local Search Rankings
def check_local_ranking(keyword, domain, country):
proxies = {'http': f'http://user:pass_country-{country}@proxy.vinaproxy.com:8080'}
# Scrape local Google
url = f'https://www.google.com/search?q={keyword}&gl={country}'
response = requests.get(url, proxies=proxies, headers=get_localized_headers(country))
# Find domain position
# ... parsing logic
# Check rankings in different countries
for country in ['us', 'uk', 'au']:
rank = check_local_ranking('web scraping', 'vinaproxy.com', country)
print(f"{country}: #{rank}")
VinaProxy Geo-Targeting
- 195+ countries available
- City-level targeting
- Giá chỉ $0.5/GB
