Scraping API

Using Scraping API

Prerequisites

To use SurfSky, you'll need the following:

API Key - A unique authentication token for accessing our services
Assigned Hostname - Your dedicated Surfsky endpoint
Proxies - Proxies in the format:
```
protocol://username:password@host:port
```
Supported protocols: HTTP, HTTPS, SOCKS5, SSH

To obtain your API key and hostname, please contact our team.

If you need proxies, please contact our team.

Code Example

Python
JavaScript

import requests
import base64


def create_profile(api_token: str, proxy: str | None = None) -> dict:
    """Create a one-time browser profile"""
    url = "https://api-public.surfsky.io/profiles/one_time"

    headers = {
        "Content-Type": "application/json",
        "X-Cloud-Api-Token": api_token
    }

    data = {}
    if proxy:
        data["proxy"] = proxy

    response = requests.post(url, headers=headers, json=data)
    response.raise_for_status()

    return response.json()


def scrape_page(api_token: str, internal_uuid: str, url: str):
    """Scrape a webpage using SurfSky's Scraping API"""
    api_url = f"https://api-public.surfsky.io/profiles/{internal_uuid}/scrape"
    
    headers = {
        "Content-Type": "application/json",
        "X-Cloud-Api-Token": api_token
    }
    
    data = {
        "url": url,
        "screenshot": True,
        "wait": 10,
        "wait_until": "domcontentloaded"
    }
    
    response = requests.post(api_url, json=data, headers=headers)
    response.raise_for_status()

    return response.json()


# Usage Example
API_TOKEN = "YOUR_API_TOKEN"
TARGET_URL = "https://example.com"

# First create a profile
profile = create_profile(API_TOKEN)
internal_uuid = profile['internal_uuid']

# Then use it to scrape
result = scrape_page(API_TOKEN, internal_uuid, TARGET_URL)

# Access the results
html_content = result['data']['content']
cookies = result['data']['cookies']

# Save screenshot if requested
if result['data']['screenshot']:
    with open('screenshot.png', 'wb') as f:
        f.write(base64.b64decode(result['data']['screenshot']))

const axios = require('axios');
const fs = require('fs');

async function createProfile(apiToken, proxy = null) {
    const url = 'https://api-public.surfsky.io/profiles/one_time';
    const headers = {
        "Content-Type": "application/json",
        "X-Cloud-Api-Token": apiToken
    };
    const data = {};
    if (proxy) {
        data.proxy = proxy;
    }
    const response = await axios.post(url, data, { headers });
    return response.data;
}

async function scrapePage(apiToken, internalUuid, url) {
    const apiUrl = `https://api-public.surfsky.io/profiles/${internalUuid}/scrape`;
    
    const headers = {
        "Content-Type": "application/json",
        "X-Cloud-Api-Token": apiToken
    };
    
    const data = {
        url: url,
        screenshot: true,
        wait: 10,
        wait_until: "domcontentloaded"
    };

    const response = await axios.post(apiUrl, data, { headers });
    return response.data;
}

// Usage Example
const API_TOKEN = 'YOUR_API_TOKEN';
const TARGET_URL = 'https://example.com';

async function main() {
    try {
        // First create a profile
        const profile = await createProfile(API_TOKEN);
        const internalUuid = profile.internal_uuid;
        
        // Then use it to scrape
        const result = await scrapePage(API_TOKEN, internalUuid, TARGET_URL);
        
        // Access the results
        const htmlContent = result.data.content;
        const cookies = result.data.cookies;

        // Save screenshot if present
        if (result.data.screenshot) {
            const buffer = Buffer.from(result.data.screenshot, 'base64');
            fs.writeFileSync('./screenshot.png', buffer);
        }
    } catch (error) {
        console.error('Error:', error);
    }
}

main();

Important Notes

You need to first create a profile using the profiles/one_time endpoint to get the internal_uuid
Screenshots are returned as base64-encoded strings
Cookies from the session are included in the response
Always remember to close the browser when you're done to release your session limit. Inactive sessions are automatically closed after 30 seconds. Set inactive_kill_timeout to change this value.
One time profile is used only once and then deleted. Use persistent profiles for long-term sessions
A proxy is required and must be passed to the create_profile function
You can run multiple sessions according to your subscription plan's session limit

For more advanced usage and error handling, check out our API Reference.

Using Scraping API​

Prerequisites​

Code Example​

Important Notes​

Using Scraping API

Prerequisites

Code Example

Important Notes