Skip to main content

Scraping API

Using Scraping API

Prerequisites

To use SurfSky, you'll need the following:

  1. API Key - A unique authentication token for accessing our services
  2. Assigned Hostname - Your dedicated Surfsky endpoint
  3. Proxies - Proxies in the format:
    protocol://username:password@host:port
    Supported protocols: HTTP, HTTPS, SOCKS5, SSH

To obtain your API key and hostname, please contact our team.

If you need proxies, please contact our team.

Code Example

import requests
import base64


def create_profile(api_token: str, proxy: str | None = None) -> dict:
"""Create a one-time browser profile"""
url = "https://api-public.surfsky.io/profiles/one_time"

headers = {
"Content-Type": "application/json",
"X-Cloud-Api-Token": api_token
}

data = {}
if proxy:
data["proxy"] = proxy

response = requests.post(url, headers=headers, json=data)
response.raise_for_status()

return response.json()


def scrape_page(api_token: str, internal_uuid: str, url: str):
"""Scrape a webpage using SurfSky's Scraping API"""
api_url = f"https://api-public.surfsky.io/profiles/{internal_uuid}/scrape"

headers = {
"Content-Type": "application/json",
"X-Cloud-Api-Token": api_token
}

data = {
"url": url,
"screenshot": True,
"wait": 10,
"wait_until": "domcontentloaded",
"timeout": 30000 # Page load timeout in milliseconds
}

response = requests.post(api_url, json=data, headers=headers)
response.raise_for_status()

return response.json()


# Usage Example
API_TOKEN = "YOUR_API_TOKEN"
TARGET_URL = "https://example.com"

# First create a profile
profile = create_profile(API_TOKEN)
internal_uuid = profile['internal_uuid']

# Then use it to scrape
result = scrape_page(API_TOKEN, internal_uuid, TARGET_URL)

# Access the results
html_content = result['data']['content']
cookies = result['data']['cookies']

# Save screenshot if requested
if result['data']['screenshot']:
with open('screenshot.png', 'wb') as f:
f.write(base64.b64decode(result['data']['screenshot']))

Scraping Parameters

ParameterTypeDefaultDescription
urlstringrequiredThe URL to scrape
screenshotbooleanfalseWhether to capture a screenshot
waitnumber0Time to wait after page load (0-60 seconds)
wait_untilstring"domcontentloaded"When to consider navigation complete: "domcontentloaded", "load", "networkidle", or "commit"
wait_forstringnullXPath or CSS selector to wait for on the page
timeoutnumber30000Maximum time to wait for page load in milliseconds

Request Queue and Rate Limits

The scraping API includes a built-in queue system to handle concurrent requests:

  • Queue Size: Maximum of 5 requests can be queued per profile
  • Processing: Requests are processed sequentially (FIFO - First In, First Out)
  • Rate Limiting: If more than 5 requests are sent while the queue is full, a 429 Too Many Requests error is returned
  • Concurrent Profiles: You can run multiple profiles simultaneously according to your subscription plan

Important Notes

  • You need to first create a profile using the profiles/one_time endpoint to get the internal_uuid
  • Screenshots are returned as base64-encoded strings
  • Cookies from the session are included in the response
  • Always remember to close the browser when you're done to release your session limit. Inactive sessions are automatically closed after 30 seconds. Set inactive_kill_timeout to change this value.
  • One time profile is used only once and then deleted. Use persistent profiles for long-term sessions
  • A proxy is required and must be passed to the create_profile function
  • You can run multiple sessions according to your subscription plan's session limit
  • The timeout parameter prevents hanging on slow or unresponsive pages
  • When the queue is full (5 requests waiting), additional requests will receive a 429 error immediately

For more advanced usage and error handling, check out our API Reference.