Skip to main content

Scraping API

Using Scraping API

Prerequisites

To use SurfSky, you'll need the following:

  1. API Key - A unique authentication token for accessing our services
  2. Assigned Hostname - Your dedicated Surfsky endpoint
  3. Proxies - Proxies in the format:
    protocol://username:password@host:port
    Supported protocols: HTTP, HTTPS, SOCKS5, SSH

To obtain your API key and hostname, please contact our team.

If you need proxies, please contact our team.

Code Example

import requests
import base64


def create_profile(api_token: str, proxy: str | None = None) -> dict:
"""Create a one-time browser profile"""
url = "https://api-public.surfsky.io/profiles/one_time"

headers = {
"Content-Type": "application/json",
"X-Cloud-Api-Token": api_token
}

data = {}
if proxy:
data["proxy"] = proxy

response = requests.post(url, headers=headers, json=data)
response.raise_for_status()

return response.json()


def scrape_page(api_token: str, internal_uuid: str, url: str):
"""Scrape a webpage using SurfSky's Scraping API"""
api_url = f"https://api-public.surfsky.io/profiles/{internal_uuid}/scrape"

headers = {
"Content-Type": "application/json",
"X-Cloud-Api-Token": api_token
}

data = {
"url": url,
"screenshot": True,
"wait": 10,
"wait_until": "domcontentloaded"
}

response = requests.post(api_url, json=data, headers=headers)
response.raise_for_status()

return response.json()


# Usage Example
API_TOKEN = "YOUR_API_TOKEN"
TARGET_URL = "https://example.com"

# First create a profile
profile = create_profile(API_TOKEN)
internal_uuid = profile['internal_uuid']

# Then use it to scrape
result = scrape_page(API_TOKEN, internal_uuid, TARGET_URL)

# Access the results
html_content = result['data']['content']
cookies = result['data']['cookies']

# Save screenshot if requested
if result['data']['screenshot']:
with open('screenshot.png', 'wb') as f:
f.write(base64.b64decode(result['data']['screenshot']))

Important Notes

  • You need to first create a profile using the profiles/one_time endpoint to get the internal_uuid
  • Screenshots are returned as base64-encoded strings
  • Cookies from the session are included in the response
  • Always remember to close the browser when you're done to release your session limit. Inactive sessions are automatically closed after 30 seconds. Set inactive_kill_timeout to change this value.
  • One time profile is used only once and then deleted. Use persistent profiles for long-term sessions
  • A proxy is required and must be passed to the create_profile function
  • You can run multiple sessions according to your subscription plan's session limit

For more advanced usage and error handling, check out our API Reference.