Scraping API
Using Scraping API
Prerequisites
To use SurfSky, you'll need the following:
- API Key - A unique authentication token for accessing our services
- Assigned Hostname - Your dedicated Surfsky endpoint
- Proxies - Proxies in the format:Supported protocols: HTTP, HTTPS, SOCKS5, SSH
protocol://username:password@host:port
To obtain your API key and hostname, please contact our team.
If you need proxies, please contact our team.
Code Example
- Python
- JavaScript
import requests
import base64
def create_profile(api_token: str, proxy: str | None = None) -> dict:
"""Create a one-time browser profile"""
url = "https://api-public.surfsky.io/profiles/one_time"
headers = {
"Content-Type": "application/json",
"X-Cloud-Api-Token": api_token
}
data = {}
if proxy:
data["proxy"] = proxy
response = requests.post(url, headers=headers, json=data)
response.raise_for_status()
return response.json()
def scrape_page(api_token: str, internal_uuid: str, url: str):
"""Scrape a webpage using SurfSky's Scraping API"""
api_url = f"https://api-public.surfsky.io/profiles/{internal_uuid}/scrape"
headers = {
"Content-Type": "application/json",
"X-Cloud-Api-Token": api_token
}
data = {
"url": url,
"screenshot": True,
"wait": 10,
"wait_until": "domcontentloaded"
}
response = requests.post(api_url, json=data, headers=headers)
response.raise_for_status()
return response.json()
# Usage Example
API_TOKEN = "YOUR_API_TOKEN"
TARGET_URL = "https://example.com"
# First create a profile
profile = create_profile(API_TOKEN)
internal_uuid = profile['internal_uuid']
# Then use it to scrape
result = scrape_page(API_TOKEN, internal_uuid, TARGET_URL)
# Access the results
html_content = result['data']['content']
cookies = result['data']['cookies']
# Save screenshot if requested
if result['data']['screenshot']:
with open('screenshot.png', 'wb') as f:
f.write(base64.b64decode(result['data']['screenshot']))
const axios = require('axios');
const fs = require('fs');
async function createProfile(apiToken, proxy = null) {
const url = 'https://api-public.surfsky.io/profiles/one_time';
const headers = {
"Content-Type": "application/json",
"X-Cloud-Api-Token": apiToken
};
const data = {};
if (proxy) {
data.proxy = proxy;
}
const response = await axios.post(url, data, { headers });
return response.data;
}
async function scrapePage(apiToken, internalUuid, url) {
const apiUrl = `https://api-public.surfsky.io/profiles/${internalUuid}/scrape`;
const headers = {
"Content-Type": "application/json",
"X-Cloud-Api-Token": apiToken
};
const data = {
url: url,
screenshot: true,
wait: 10,
wait_until: "domcontentloaded"
};
const response = await axios.post(apiUrl, data, { headers });
return response.data;
}
// Usage Example
const API_TOKEN = 'YOUR_API_TOKEN';
const TARGET_URL = 'https://example.com';
async function main() {
try {
// First create a profile
const profile = await createProfile(API_TOKEN);
const internalUuid = profile.internal_uuid;
// Then use it to scrape
const result = await scrapePage(API_TOKEN, internalUuid, TARGET_URL);
// Access the results
const htmlContent = result.data.content;
const cookies = result.data.cookies;
// Save screenshot if present
if (result.data.screenshot) {
const buffer = Buffer.from(result.data.screenshot, 'base64');
fs.writeFileSync('./screenshot.png', buffer);
}
} catch (error) {
console.error('Error:', error);
}
}
main();
Important Notes
- You need to first create a profile using the
profiles/one_time
endpoint to get theinternal_uuid
- Screenshots are returned as base64-encoded strings
- Cookies from the session are included in the response
- Always remember to close the browser when you're done to release your session limit. Inactive sessions are automatically closed after 30 seconds. Set
inactive_kill_timeout
to change this value. - One time profile is used only once and then deleted. Use persistent profiles for long-term sessions
- A proxy is required and must be passed to the
create_profile
function - You can run multiple sessions according to your subscription plan's session limit
For more advanced usage and error handling, check out our API Reference.