Scraping API
Using Scraping API
Prerequisites
To use SurfSky, you'll need the following:
- API Key - A unique authentication token for accessing our services
- Assigned Hostname - Your dedicated Surfsky endpoint
- Proxies - Proxies in the format:Supported protocols: HTTP, HTTPS, SOCKS5, SSH
protocol://username:password@host:port
To obtain your API key and hostname, please contact our team.
If you need proxies, please contact our team.
Code Example
- Python
- JavaScript
import requests
import base64
def create_profile(api_token: str, proxy: str | None = None) -> dict:
"""Create a one-time browser profile"""
url = "https://api-public.surfsky.io/profiles/one_time"
headers = {
"Content-Type": "application/json",
"X-Cloud-Api-Token": api_token
}
data = {}
if proxy:
data["proxy"] = proxy
response = requests.post(url, headers=headers, json=data)
response.raise_for_status()
return response.json()
def scrape_page(api_token: str, internal_uuid: str, url: str):
"""Scrape a webpage using SurfSky's Scraping API"""
api_url = f"https://api-public.surfsky.io/profiles/{internal_uuid}/scrape"
headers = {
"Content-Type": "application/json",
"X-Cloud-Api-Token": api_token
}
data = {
"url": url,
"screenshot": True,
"wait": 10,
"wait_until": "domcontentloaded",
"timeout": 30000 # Page load timeout in milliseconds
}
response = requests.post(api_url, json=data, headers=headers)
response.raise_for_status()
return response.json()
# Usage Example
API_TOKEN = "YOUR_API_TOKEN"
TARGET_URL = "https://example.com"
# First create a profile
profile = create_profile(API_TOKEN)
internal_uuid = profile['internal_uuid']
# Then use it to scrape
result = scrape_page(API_TOKEN, internal_uuid, TARGET_URL)
# Access the results
html_content = result['data']['content']
cookies = result['data']['cookies']
# Save screenshot if requested
if result['data']['screenshot']:
with open('screenshot.png', 'wb') as f:
f.write(base64.b64decode(result['data']['screenshot']))
const axios = require('axios');
const fs = require('fs');
async function createProfile(apiToken, proxy = null) {
const url = 'https://api-public.surfsky.io/profiles/one_time';
const headers = {
"Content-Type": "application/json",
"X-Cloud-Api-Token": apiToken
};
const data = {};
if (proxy) {
data.proxy = proxy;
}
const response = await axios.post(url, data, { headers });
return response.data;
}
async function scrapePage(apiToken, internalUuid, url) {
const apiUrl = `https://api-public.surfsky.io/profiles/${internalUuid}/scrape`;
const headers = {
"Content-Type": "application/json",
"X-Cloud-Api-Token": apiToken
};
const data = {
url: url,
screenshot: true,
wait: 10,
wait_until: "domcontentloaded",
timeout: 30000 // Page load timeout in milliseconds
};
const response = await axios.post(apiUrl, data, { headers });
return response.data;
}
// Usage Example
const API_TOKEN = 'YOUR_API_TOKEN';
const TARGET_URL = 'https://example.com';
async function main() {
try {
// First create a profile
const profile = await createProfile(API_TOKEN);
const internalUuid = profile.internal_uuid;
// Then use it to scrape
const result = await scrapePage(API_TOKEN, internalUuid, TARGET_URL);
// Access the results
const htmlContent = result.data.content;
const cookies = result.data.cookies;
// Save screenshot if present
if (result.data.screenshot) {
const buffer = Buffer.from(result.data.screenshot, 'base64');
fs.writeFileSync('./screenshot.png', buffer);
}
} catch (error) {
console.error('Error:', error);
}
}
main();
Scraping Parameters
Parameter | Type | Default | Description |
---|---|---|---|
url | string | required | The URL to scrape |
screenshot | boolean | false | Whether to capture a screenshot |
wait | number | 0 | Time to wait after page load (0-60 seconds) |
wait_until | string | "domcontentloaded" | When to consider navigation complete: "domcontentloaded", "load", "networkidle", or "commit" |
wait_for | string | null | XPath or CSS selector to wait for on the page |
timeout | number | 30000 | Maximum time to wait for page load in milliseconds |
Request Queue and Rate Limits
The scraping API includes a built-in queue system to handle concurrent requests:
- Queue Size: Maximum of 5 requests can be queued per profile
- Processing: Requests are processed sequentially (FIFO - First In, First Out)
- Rate Limiting: If more than 5 requests are sent while the queue is full, a
429 Too Many Requests
error is returned - Concurrent Profiles: You can run multiple profiles simultaneously according to your subscription plan
Important Notes
- You need to first create a profile using the
profiles/one_time
endpoint to get theinternal_uuid
- Screenshots are returned as base64-encoded strings
- Cookies from the session are included in the response
- Always remember to close the browser when you're done to release your session limit. Inactive sessions are automatically closed after 30 seconds. Set
inactive_kill_timeout
to change this value. - One time profile is used only once and then deleted. Use persistent profiles for long-term sessions
- A proxy is required and must be passed to the
create_profile
function - You can run multiple sessions according to your subscription plan's session limit
- The
timeout
parameter prevents hanging on slow or unresponsive pages - When the queue is full (5 requests waiting), additional requests will receive a
429
error immediately
For more advanced usage and error handling, check out our API Reference.