Speed Optimization

Surfsky provides powerful capabilities for running multiple browser instances in parallel to optimize your automation workflows. Thanks to our advanced Chromium core-level optimizations and enterprise-grade Kubernetes infrastructure, you can scale up to 1,000 concurrent browser instances (depending on your subscription plan) while maintaining stability and performance.

Each browser instance operates in complete isolation, with its own unique fingerprint and characteristics. This means that anti-bot systems can only correlate browser instances based on your automation patterns and behaviors, not through any inherent browser signatures or fingerprints. This architectural design provides several key advantages:

True Browser Isolation: Each instance maintains its own independent state and fingerprint
Resource Optimization: Chromium-level optimizations ensure efficient RAM and CPU usage
High Reliability: Kubernetes orchestration ensures stable operation at scale
Flexible Scaling: Easily scale from a few browsers to hundreds based on your needs

Here's how to leverage these features effectively.

Basic Parallelization Example

note

If you're interested in specific implementation examples or need code for particular use cases, please contact our team. We're happy to provide additional examples and guidance tailored to your needs.

Here's a simplified example of how to implement parallel processing:

from dataclasses import dataclass
from core.executor import TaskExecutor
from core.config import ExecutorConfig
from core.pipeline.types import BaseTask
from core.pipeline.result import Result

@dataclass
class ScrapedData:
    title: str
    status: int

class WebScraper(BaseTask):
    async def main(self, browser, url) -> Result[ScrapedData]:
        try:
            async with browser.managed_page() as page:
                response = await page.goto(url)
                title = await page.title()
                return Result.success(ScrapedData(
                    title=title,
                    status=response.status
                ))
        except Exception as e:
            return Result.failure(f"Error: {str(e)}")

async def main():
    # Configure parallel execution
    config = ExecutorConfig(
        browser_count=10,          # Number of parallel browsers
        max_browser_tasks=5,       # Tasks per browser before recycling
        max_task_attempts=3,       # Retry attempts per task
        fingerprint={"os": "mac"}  # Browser fingerprint
    )

    # Initialize scraper and executor
    scraper = WebScraper()
    executor = TaskExecutor(config)
    
    # Your URLs to process
    urls = [
        "https://example1.com",
        "https://example2.com",
        # ... more URLs
    ]

    # Execute tasks in parallel
    results, metrics = await executor.execute(urls, scraper)

Optimization Strategies

1. Browser Pool Management

# Configure optimal browser pool size
config = ExecutorConfig(
    browser_count=10,              # Adjust based on system resources
    max_browser_tasks=5,           # Balance between reuse and freshness
    max_browser_attempts=3,        # Retry limit for browser issues
    task_timeout=30,              # Timeout for individual tasks
    attempt_delay=2               # Delay between retry attempts
)

2. Proxy Rotation

def get_proxies(count: int) -> list[str]:
    countries = ["US", "UK", "DE", "FR"]  # Target countries
    return [
        f"socks5://user:[email protected]:1080?country={country}"
        for country in countries
    ]

config = ExecutorConfig(
    browser_count=10,
    proxies=get_proxies(20),  # Maintain proxy pool larger than browser count
)

3. Error Handling and Retries

class ResilientScraper(BaseTask):
    async def main(self, browser, url) -> Result:
        try:
            async with browser.managed_page() as page:
                # Add custom retry logic
                for attempt in range(3):
                    try:
                        await page.goto(url, timeout=10000)
                        return Result.success(await self.extract_data(page))
                    except Exception as e:
                        if attempt == 2:  # Last attempt
                            raise
                        await asyncio.sleep(2)  # Delay between attempts
        except Exception as e:
            return Result.failure(str(e))

Best Practices

Resource Management

Monitor rate limiting headers:

x-ratelimit-limit: 200           # Maximum requests per minute
x-ratelimit-limit-hour: 3000     # Maximum requests per hour
x-ratelimit-remaining: 198       # Remaining requests this minute
x-ratelimit-remaining-hour: 2998 # Remaining requests this hour

Monitor active browser count using the /active or /profiles endpoint
If you exceed browser limits, you'll receive a 429 (Too Many Requests) error
If encountering errors, provide the tracing UUID to support:
```
x-cloud-tracing-uuid: fea367e7cfc840818508754b5f1c1f51
```
Use max_browser_tasks to recycle browsers periodically

Browser Lifecycle Management

Browsers can be closed in several ways:

Automatic Closure
- Browsers are automatically closed after being inactive for inactive_kill_timeout seconds
- This helps prevent resource wastage from forgotten sessions
Automation Framework Methods
- Using browser.close() in Playwright/Puppeteer
- These methods internally send CDP (Chrome DevTools Protocol) commands to close the WebSocket connection
API Endpoints
- Stop one-time profile
- Stop persistent profile

tip

Always properly close your browser sessions when done to maintain optimal resource usage and stay within your plan limits.

Proxy Strategy
- Maintain a larger proxy pool than browser count
- Rotate proxies based on geographic needs
- Monitor proxy health and performance
Error Handling
- Implement graceful retries
- Add appropriate delays between attempts
- Log and monitor failure patterns
Performance Monitoring

# Track execution metrics
print(f"Success rate: {(metrics.completed/metrics.total)*100:.1f}%")
print(f"Average processing time: {metrics.avg_time:.2f}s")
print(f"Failed tasks: {metrics.failed}")

Common Pitfalls

Over-parallelization
- Exceeding your plan's concurrent browser limit will trigger rate limiting (429 error)
- Can trigger rate limiting from target sites
Insufficient Error Handling
- Not accounting for network issues
- Missing retry logic for temporary failures
Poor Resource Management
- Not recycling browsers after heavy use
- Memory leaks from unclosed resources

note

Remember to monitor your automation's performance and adjust these parameters based on your specific use case and target website's requirements.

Understanding Network Latency

Round Trip Time (RTT) Considerations

When moving from local development to Surfsky's cloud infrastructure, it's important to understand how network latency affects your automation:

Local vs Cloud Execution
- Local development has near-instant RTT
- Cloud execution involves network latency for each request
- Additional services (like CAPTCHA solving or proxies) add extra latency
CDP Command Overhead
- Each page interaction may trigger multiple CDP commands
- Sequential operations accumulate RTT
- What seems fast locally may be slower in production

Optimization Strategies

Implement Parallel Processing Instead of sequential operations:

# Slower: Sequential processing
for element in elements:
    await get_element_text(element)

# Faster: Parallel processing
await asyncio.gather(
    *(get_element_text(element) for element in elements)
)

Choose the Right Framework
- WebSocket-based frameworks (Playwright, Puppeteer) offer better performance
- HTTP-based frameworks (Selenium) may have higher latency
- Consider low-level frameworks for maximum performance

Framework-Specific Optimizations Example for Playwright:

# Slower: Simulates keystrokes
await page.keyboard.type("text")

# Faster: Direct value setting
await page.locator("input").fill("text")

Geographic Optimization
- Use browsers in regions close to your infrastructure
- Regional proximity can improve performance by 8-9x
- Consider multi-region deployment for global operations

tip

When designing your automation workflow, always consider the impact of network latency and implement parallel processing where possible to achieve optimal performance.

tip

Running your automation close to Surfsky's infrastructure can lead to dramatic performance improvements, especially for operations requiring multiple CDP commands like form filling and submissions.

Speed Optimization

Basic Parallelization Example​

Optimization Strategies​

1. Browser Pool Management​

2. Proxy Rotation​

3. Error Handling and Retries​

Best Practices​

Browser Lifecycle Management​

Common Pitfalls​

Understanding Network Latency​

Round Trip Time (RTT) Considerations​

Optimization Strategies​

Basic Parallelization Example

Optimization Strategies

1. Browser Pool Management

2. Proxy Rotation

3. Error Handling and Retries

Best Practices

Browser Lifecycle Management

Common Pitfalls

Understanding Network Latency

Round Trip Time (RTT) Considerations

Optimization Strategies