Skip to main content

DataDome Solving with Surfsky

Overview

DataDome is a leading bot protection solution used by many e-commerce and high-traffic websites. Surfsky provides multiple approaches to handle DataDome challenges, from simple automatic solving to advanced proxy rotation strategies.

Watch the video below for a visual demonstration:

Quick Start

For most use cases, our simplified CAPTCHA solving approach handles DataDome automatically. See our comprehensive CAPTCHA Solving Guide for full details.

Prerequisites

  1. Enable anti_captcha in your browser profile settings
  2. Gemini API Key - Get one from Google AI Studio
    • Pass it when starting your browser. See API Reference for details
  3. Use quality proxies to minimize DataDome challenges

Best Practices for DataDome

DataDome is particularly sensitive to automation patterns. To minimize challenges:

  1. Use Quality Proxies - Residential or mobile proxies work best
  2. Rotate Intelligently - Change proxy after 3-5 requests for aggressive scraping
  3. Act Human - Add delays and use Human Emulation
  4. Reuse Success - Profiles that passed DataDome have valuable cookies (25+ requests possible)

Solving Methods

The easiest way to handle DataDome - detect and solve when needed:

import asyncio
from playwright.async_api import async_playwright

async def solve_datadome_simple():
async with async_playwright() as p:
# Connect to your Surfsky browser
browser = await p.chromium.connect_over_cdp("ws://your-browser-url")
page = await browser.new_page()

# Create CDP session
client = await page.context.new_cdp_session(page)

# Navigate to DataDome protected site
await page.goto("https://example.com/protected")

# Check if DataDome challenge appears
if await page.query_selector('iframe[src*="datadome"]'):
print("DataDome detected, solving...")

# Solve DataDome
response = await client.send("Captcha.solve", {"type": "datadome"})

if response.get("status") == "success":
print("✓ DataDome solved!")
# Page will reload automatically with solution
await page.wait_for_load_state("networkidle")
# Continue scraping
content = await page.content()
print("Page loaded successfully")
else:
print("✗ Failed to solve DataDome")

await browser.close()

asyncio.run(solve_datadome_simple())

Method 2: Auto Solving

Let the browser automatically handle DataDome challenges:

import asyncio
from playwright.async_api import async_playwright

async def solve_datadome_auto():
async with async_playwright() as p:
browser = await p.chromium.connect_over_cdp("ws://your-browser-url")
page = await browser.new_page()
client = await page.context.new_cdp_session(page)

# Enable auto-solving for DataDome
await client.send("Captcha.autoSolve", {"type": "datadome"})

print("Auto-solve activated - DataDome will be solved automatically")

# Navigate to any page - DataDome challenges will be handled automatically
await page.goto("https://example.com/protected")

# Continue with your scraping
# Any DataDome that appears will be solved in the background

await browser.close()

asyncio.run(solve_datadome_auto())

Legacy Method: Advanced Implementation with 2Captcha

When to Use Legacy Method

Use this approach only if:

  • You need custom 2Captcha integration
  • You're migrating from an older implementation

Prerequisites for Legacy Method

  1. API Key - Your Surfsky API token
  2. Proxies - High-quality residential or mobile proxies
  3. 2Captcha Key (Optional) - For explicit captcha solving

Legacy Implementation

from datetime import datetime
import re
import uuid
import httpx
import asyncio
from playwright.async_api import async_playwright, Page, Browser
from bs4 import BeautifulSoup


CLOUD_API_TOKEN = "YOUR_API_TOKEN"

SOLVE_CAPTCHA = False
CAPTCHA_KEY = "YOUR_2CAPTCHA_KEY"


def generate_proxy() -> str:
"""Generate a proxy string with a new random SID"""
proxy_sid = str(uuid.uuid4().hex)
return f"socks5://username:password@host:port" # Replace with your proxy details


class DataDomeIPBannedException(Exception):
"""Raised when DataDome has banned the IP address (t=bv in captcha URL)"""
pass


class DataDomeCaptchaException(Exception):
"""Raised when there's an issue with the captcha solving process"""
pass


class DataDomeCaptchaSolver:
def __init__(self, api_key: str):
self.api_key = api_key

def format_proxy_string(self, proxy_string: str) -> dict:
regex = r'^(socks5|http|https):\/\/([^:]+):([^@]+)@([^:]+):(\d+)$'
match = re.match(regex, proxy_string)

if not match:
raise ValueError('Invalid proxy string format')

protocol, login, password, host, port = match.groups()

return {
"proxyType": protocol,
"proxyAddress": host,
"proxyPort": port,
"proxyLogin": login,
"proxyPassword": password
}

async def solve(self, page: Page, url: str, proxy: str) -> None:
captcha_url = await self.maybe_get_captcha_url(page)

if not captcha_url:
print("DataDome captcha not found")
return

print("Found captcha. Trying to solve...")
print("captcha_url:", captcha_url)

# Check if IP is banned
if 't=bv' in captcha_url:
raise DataDomeIPBannedException("IP is banned by DataDome (t=bv). Need to change IP address.")

if 't=fe' not in captcha_url:
raise DataDomeCaptchaException("Expected t=fe in captcha URL")

user_agent = await self.get_user_agent(page)
proxy_obj = self.format_proxy_string(proxy)

task_id = await self.create_datadome_task(url, captcha_url, user_agent, proxy_obj)
print("2captcha taskId: ", task_id)

solution_cookies = await self.get_task_result(task_id)
print("2captcha solutionCookies: ", solution_cookies)

await self.set_cookies(page, solution_cookies)

await page.reload(wait_until='networkidle')

async def set_cookies(self, page: Page, solution_cookies: str) -> None:
"""Set cookies from "solution_cookies" string"""
await page.evaluate(f'''() => {{
document.cookie = `{solution_cookies}`;
}}''')
print("Cookies set")

async def maybe_get_captcha_url(self, page: Page) -> str:
return await page.evaluate('''() => {
const frame = document.querySelector('iframe[src*="captcha-delivery.com/captcha/"]');
return frame ? frame.src : null;
}''')

async def get_user_agent(self, page: Page) -> str:
return await page.evaluate('() => navigator.userAgent')

async def create_datadome_task(self, website_url: str, captcha_url: str, user_agent: str, proxy_obj: dict) -> str:
task_data = {
"type": "DataDomeSliderTask",
"websiteURL": website_url,
"captchaUrl": captcha_url,
"userAgent": user_agent,
"proxyType": proxy_obj["proxyType"],
"proxyAddress": proxy_obj["proxyAddress"],
"proxyPort": proxy_obj["proxyPort"],
"proxyLogin": proxy_obj["proxyLogin"],
"proxyPassword": proxy_obj["proxyPassword"]
}

data = {
"clientKey": self.api_key,
"task": task_data
}

async with httpx.AsyncClient() as client:
response = await client.post('https://api.2captcha.com/createTask', json=data)
response_data = response.json()
return response_data["taskId"]

async def get_task_result(self, task_id: str) -> str:
while True:
await asyncio.sleep(5)

async with httpx.AsyncClient() as client:
response = await client.get('https://2captcha.com/res.php', params={
"key": self.api_key,
"action": "get",
"id": task_id
})

result = response.text

if result == 'CAPCHA_NOT_READY':
print('2captcha task not ready yet. Waiting...')
continue

print(result)

if not result.startswith('OK|'):
raise Exception(f"Failed to get 2captcha result: {result}")

return result.split('|')[1]


class SurfskyBrowser:
def __init__(self, playwright, proxy: str):
self.playwright = playwright
self.proxy = proxy
self.browser: Browser | None = None
self.page: Page | None = None

async def setup(self) -> None:
"""Initialize browser and page"""
cdp_url = await self._start_browser()
print(f"Connecting to browser at {cdp_url}")

self.browser = await self.playwright.chromium.connect_over_cdp(cdp_url)
context = self.browser.contexts[0] if self.browser.contexts else await self.browser.new_context()
self.page = context.pages[0] if context.pages else await context.new_page()
print("Browser setup complete")

async def _start_browser(self) -> str:
"""Initialize and start a browser session"""
url = "https://api-public.surfsky.io/profiles/one_time"
headers = {
"Content-Type": "application/json",
"X-Cloud-Api-Token": CLOUD_API_TOKEN
}
data = {
"browser_settings": {
"inactive_kill_timeout": 60,
},
"fingerprint": {
"os": "mac"
}
}
if self.proxy:
data["proxy"] = self.proxy

async with httpx.AsyncClient(timeout=60) as client:
response = await client.post(url, headers=headers, json=data)
response.raise_for_status()
data = response.json()
devtools_url = data["inspector"]["pages"][0]["devtools_url"]
print("Devtools URL:\n", devtools_url)
return data["ws_url"]

async def close(self) -> None:
"""Safely close browser"""
if self.browser:
try:
await self.browser.close()
print("Browser closed successfully")
except Exception as e:
print(f"Error closing browser: {str(e)}")


class UseCase:
"""Use case for scraping websites with DataDome handling"""

def __init__(self, browser: SurfskyBrowser, solver: DataDomeCaptchaSolver, solve_captcha: bool = False):
self.browser = browser
self.solver = solver
self.solve_captcha = solve_captcha

async def execute(self, url: str) -> bool:
"""Execute the scraping use case for a single URL"""
if not self.browser.page:
raise RuntimeError("Browser not initialized")

try:
await self._navigate_to_page(url)
await self._handle_captcha(url)
await self._save_screenshot()
# Here you can add more specific scraping logic
# await self._extract_reviews()
return True

except DataDomeIPBannedException as e:
print(f"IP banned at URL {url}: {str(e)}")
return False
except Exception as e:
print(f"Error processing URL {url}: {str(e)}")
return True # Non-IP-ban errors considered "successful"

async def _navigate_to_page(self, url: str) -> None:
await self.browser.page.goto(url)

html = await self.browser.page.content()
soup = BeautifulSoup(html, 'html.parser')
print(f"Page title: {soup.title.string}")

async def _handle_captcha(self, url: str) -> None:
"""Handle DataDome captcha if present"""
if not self.solve_captcha:
print("Captcha solving disabled, skipping...")
return
await self.solver.solve(self.browser.page, url, self.browser.proxy)

async def _save_screenshot(self) -> None:
"""Save screenshot of the current page"""
ts = datetime.now().strftime("%Y%m%d_%H%M%S")
screenshot_filename = f"screenshot_{ts}.png"
await self.browser.page.screenshot(path=screenshot_filename)
print(f"Screenshot saved as {screenshot_filename}")


class DataDomeScraper:
def __init__(self, solve_captcha: bool = SOLVE_CAPTCHA):
self.browser: SurfskyBrowser | None = None
self.solver = DataDomeCaptchaSolver(CAPTCHA_KEY)
self.solve_captcha = solve_captcha

async def _rotate_browser(self, playwright) -> None:
"""Rotate browser: close existing and create new one"""
# Close existing browser if any
if self.browser:
try:
await self.browser.close()
print("Previous browser closed successfully")
except Exception as e:
print(f"Error closing previous browser: {str(e)}")
finally:
self.browser = None

# Create new browser
try:
proxy = generate_proxy()
print(f"Creating new browser")
self.browser = SurfskyBrowser(playwright, proxy)
await self.browser.setup()
print("New browser setup completed")
except Exception as e:
print(f"Failed to create new browser: {str(e)}")
self.browser = None
raise

async def process_url(self, playwright, url: str) -> bool:
"""Process single URL with browser management"""
if not self.browser:
await self._rotate_browser(playwright)

try:
use_case = UseCase(self.browser, self.solver, self.solve_captcha)
return await use_case.execute(url)

except DataDomeIPBannedException:
print(f"IP banned, rotating browser...")
await self._rotate_browser(playwright)
return False

except Exception as e:
print(f"Unexpected error during URL processing: {str(e)}")
await self._rotate_browser(playwright)
raise


async def process_urls(urls: list[str], solve_captcha: bool = True) -> None:
"""Main function to process all URLs"""
remaining_urls = urls.copy()

async with async_playwright() as p:
scraper = DataDomeScraper(solve_captcha)

while remaining_urls:
url = remaining_urls[0]
try:
success = await scraper.process_url(p, url)
if success:
remaining_urls.pop(0)
print(f"Processed URL {url}. Remaining: {len(remaining_urls)}")
except Exception as e:
print(f"Error processing URL {url}: {str(e)}")
await scraper._rotate_browser(p) # Ensure clean state after error

print("All URLs processed")


if __name__ == "__main__":
# Example URLs
urls = [
# Change to your target URL
"https://www.example.com/page1",
"https://www.example.com/page2",
]
# Use global SOLVE_CAPTCHA setting
asyncio.run(process_urls(urls, solve_captcha=SOLVE_CAPTCHA))

DataDome-Specific Tips

IP Ban Detection

If you see t=bv in the DataDome URL, your IP is banned:

  • Switch to a new proxy immediately
  • Consider using a different proxy provider
  • Reduce request frequency

Need Help?

For more details on CAPTCHA solving, see our comprehensive CAPTCHA Solving Guide.

For advanced proxy management and error handling, check our API Reference.

Questions? Contact us at [email protected].