Overview
This guide covers production scaling strategies for Automagik Spark. Learn how to horizontally scale workers, optimize database connections, manage Redis memory, and configure task queues for high-throughput workflow execution.Worker Scaling
Horizontal Scaling with Docker Compose
Scale workers to handle increased task volume:Scale Workers Dynamically
Worker Concurrency Configuration
Control how many tasks each worker processes concurrently:| Configuration | Threads | Workers | Total Capacity | Use Case |
|---|---|---|---|---|
| Low throughput | 2 | 1 | 2 tasks | Development, testing |
| Medium throughput | 4 | 2 | 8 tasks | Small production |
| High throughput | 4 | 5 | 20 tasks | Medium production |
| Very high throughput | 8 | 10 | 80 tasks | Large production |
Worker Configuration Options
- worker_prefetch_multiplier: 1 - Each worker takes one task at a time. Prevents uneven load distribution.
- worker_max_tasks_per_child: 100 - Worker restarts after 100 tasks. Prevents memory leaks in long-running processes.
- worker_max_memory_per_child: 200000 - Worker restarts if it exceeds 200MB memory. Protects against runaway memory usage.
- task_track_started: True - Tasks show “running” status immediately, not just “queued”. Better visibility.
Monitoring Worker Health
Database Connection Pooling
PostgreSQL Pool Size Tuning
SQLAlchemy creates a connection pool for each process. Configure it based on your worker count:Configure PostgreSQL
Use PgBouncer for Connection Pooling
- Reduces PostgreSQL connection overhead
- Better resource utilization
- Handles connection spikes gracefully
- Faster connection establishment
Database Connection Best Practices
| Parameter | Development | Production | High Load |
|---|---|---|---|
| pool_size | 5 | 10 | 20 |
| max_overflow | 10 | 20 | 40 |
| pool_timeout | 30 | 30 | 60 |
| pool_recycle | 3600 | 3600 | 1800 |
Redis Memory Management
Configure Redis Memory Limits
Memory Eviction Policies
| Policy | Behavior | Use Case |
|---|---|---|
| allkeys-lru | Evict least recently used keys | Recommended for Spark - Good for task queue |
| allkeys-lfu | Evict least frequently used keys | High read-to-write ratio |
| volatile-lru | Evict LRU keys with expiration set | Mixed usage (tasks + cache) |
| volatile-ttl | Evict keys with shortest TTL first | Time-sensitive data |
| noeviction | Return errors when memory limit reached | Guaranteed task persistence |
allkeys-lru for task queues. Tasks are consumed quickly, older tasks can be safely evicted.
Monitor Redis Memory
Redis Persistence Options
For production task queues, choose based on your durability requirements:Queue Priority Strategies
Configure Task Queues
Route Tasks by Priority
Start Workers for Specific Queues
Long-Running Task Handling
Configure Task Timeouts
Retry Configuration
| Attempt | Delay | Cumulative Time |
|---|---|---|
| 1st retry | 2s | 2s |
| 2nd retry | 4s | 6s |
| 3rd retry | 8s | 14s |
Handle Connection Errors with Retry
Monitoring and Alerting
Key Metrics to Track
| Metric | Warning Threshold | Critical Threshold | Action |
|---|---|---|---|
| Task queue depth | > 100 tasks | > 500 tasks | Scale workers |
| Task failure rate | > 5% | > 20% | Check workflow errors |
| Worker CPU usage | > 80% | > 95% | Add more workers |
| Database connections | > 70% of max | > 90% of max | Increase pool size |
| Redis memory usage | > 70% | > 90% | Increase memory limit |
| Task execution time | > 5 minutes | > 10 minutes | Optimize workflows |
| PostgreSQL query time | > 100ms | > 500ms | Add indexes, optimize queries |
Monitoring Commands
Set Up Alerting (Prometheus Example)
Performance Tuning Table
Recommended Configurations by Scale
| Scale | Workers | Threads/Worker | Pool Size | Redis Memory | Expected Throughput |
|---|---|---|---|---|---|
| Small | 2 | 4 | 10 | 512MB | ~50 tasks/min |
| Medium | 5 | 4 | 20 | 2GB | ~200 tasks/min |
| Large | 10 | 8 | 40 | 4GB | ~800 tasks/min |
| Enterprise | 20 | 8 | 80 | 8GB | ~1600 tasks/min |
Bottleneck Identification
| Symptom | Likely Bottleneck | Solution |
|---|---|---|
| Tasks queued but not executing | Too few workers | Scale workers |
| High CPU on workers | CPU-bound tasks | Add more worker instances |
| High memory on workers | Memory leaks | Reduce max_tasks_per_child |
| Slow database queries | Database connections exhausted | Increase pool size or use PgBouncer |
| Redis memory errors | Queue overflow | Increase Redis memory or process tasks faster |
| Tasks timing out | Long-running workflows | Increase soft_time_limit |
Load Testing
Simulate High Task Load
Monitor During Load Test
Next Steps
- Review Custom Adapters to build your own workflow integrations
- Check Production Deployment for complete production setup
- Monitor your deployment with Common Errors guide

