Prerequisites
Before deploying Spark to production, ensure you have:
Server Requirements
Linux server (Ubuntu 20.04+ or Debian 11+ recommended)
Minimum 2GB RAM (4GB+ recommended for production workloads)
20GB available disk space
Docker 20.10+ and Docker Compose 2.0+
Network Requirements
Port 8883 available for Spark API
Internal ports for PostgreSQL (5402) and Redis (5412)
Domain name (optional, for SSL/HTTPS access)
Optional
SSL certificates (Let’s Encrypt recommended)
Nginx or Traefik for reverse proxy
Monitoring tools (Prometheus, Grafana)
Docker Compose Setup
Complete Production Configuration
Create a docker-compose.yml file with the following production-ready configuration:
services :
# Redis for Celery task queue
redis :
image : redis:7.4.2-alpine
container_name : automagik-spark-redis
ports :
- "${AUTOMAGIK_SPARK_REDIS_PORT:-5412}:${AUTOMAGIK_SPARK_REDIS_PORT:-5412}"
command : redis-server --port ${AUTOMAGIK_SPARK_REDIS_PORT:-5412} --appendonly yes --requirepass "${AUTOMAGIK_SPARK_REDIS_PASSWORD:-spark_redis_pass}"
volumes :
- redis-data:/data
healthcheck :
test : [ "CMD" , "redis-cli" , "-p" , "${AUTOMAGIK_SPARK_REDIS_PORT:-5412}" , "-a" , "${AUTOMAGIK_SPARK_REDIS_PASSWORD:-spark_redis_pass}" , "ping" ]
interval : 5s
timeout : 5s
retries : 5
restart : unless-stopped
networks :
- automagik-spark-network
# PostgreSQL database
automagik-spark-db :
image : postgres:15
container_name : automagik-spark-db
environment :
POSTGRES_USER : ${AUTOMAGIK_SPARK_POSTGRES_USER:-spark_user}
POSTGRES_PASSWORD : ${AUTOMAGIK_SPARK_POSTGRES_PASSWORD:-spark_pass}
POSTGRES_DB : automagik_spark
volumes :
- automagik-spark-db-data:/var/lib/postgresql/data
command : postgres -p ${AUTOMAGIK_SPARK_POSTGRES_PORT:-5402}
ports :
- "${AUTOMAGIK_SPARK_POSTGRES_PORT:-5402}:${AUTOMAGIK_SPARK_POSTGRES_PORT:-5402}"
healthcheck :
test : [ "CMD-SHELL" , "pg_isready -U ${AUTOMAGIK_SPARK_POSTGRES_USER:-spark_user} -p ${AUTOMAGIK_SPARK_POSTGRES_PORT:-5402}" ]
interval : 5s
timeout : 5s
retries : 5
restart : unless-stopped
networks :
- automagik-spark-network
# Spark API service
automagik-spark-api :
image : namastexlabs/automagik_spark-spark-api:latest
container_name : automagik-spark-api
env_file :
- .env
environment :
ENVIRONMENT : production
PYTHONUNBUFFERED : 1
ports :
- "8883:8883"
depends_on :
automagik-spark-db :
condition : service_healthy
redis :
condition : service_healthy
healthcheck :
test : [ "CMD-SHELL" , "curl -f http://localhost:8883/health || exit 1" ]
interval : 10s
timeout : 5s
retries : 3
start_period : 30s
restart : unless-stopped
networks :
- automagik-spark-network
# Spark Worker service (scales horizontally)
automagik-spark-worker :
image : namastexlabs/automagik_spark-spark-worker:latest
container_name : automagik-spark-worker
env_file :
- .env
environment :
ENVIRONMENT : production
PYTHONUNBUFFERED : 1
depends_on :
automagik-spark-db :
condition : service_healthy
automagik-spark-api :
condition : service_healthy
redis :
condition : service_healthy
restart : unless-stopped
networks :
- automagik-spark-network
# Celery Beat scheduler (singleton - do not scale)
automagik-spark-beat :
image : namastexlabs/automagik_spark-spark-worker:latest
container_name : automagik-spark-beat
env_file :
- .env
environment :
ENVIRONMENT : production
PYTHONUNBUFFERED : 1
command : celery -A automagik_spark.core.celery.celery_app beat --loglevel=INFO --schedule=/tmp/celerybeat-schedule --max-interval=1
depends_on :
automagik-spark-db :
condition : service_healthy
redis :
condition : service_healthy
restart : unless-stopped
networks :
- automagik-spark-network
volumes :
automagik-spark-db-data :
redis-data :
networks :
automagik-spark-network :
name : automagik-spark-network
Important: The Celery Beat scheduler (automagik-spark-beat) must run as a singleton . Never scale this service to multiple instances, as it will cause duplicate task scheduling.
Environment Configuration
Production Environment Variables
Create a .env file in the same directory as your docker-compose.yml:
# =================================================================
# 🌍 Global Environment Configuration
# =================================================================
ENVIRONMENT = production
# Global Logging Configuration
LOG_LEVEL = INFO
LOG_FOLDER = ./logs
# Global Timezone (Critical for scheduling)
AUTOMAGIK_TIMEZONE = UTC
# Global Encryption and Security
AUTOMAGIK_ENCRYPTION_KEY =< GENERATE_SECURE_KEY >
# =================================================================
# 🔧 Core Application Settings (REQUIRED)
# =================================================================
AUTOMAGIK_SPARK_API_KEY =< GENERATE_SECURE_API_KEY >
AUTOMAGIK_SPARK_API_HOST = 0.0.0.0
AUTOMAGIK_SPARK_API_PORT = 8883
# CORS origins (comma-separated list)
AUTOMAGIK_SPARK_API_CORS = https://yourdomain.com,https://app.yourdomain.com
# Public URL where Spark API is accessible
AUTOMAGIK_SPARK_REMOTE_URL = https://yourdomain.com
# =================================================================
# 🗄️ Database Configuration
# =================================================================
# PostgreSQL connection for production
AUTOMAGIK_SPARK_POSTGRES_USER = spark_user
AUTOMAGIK_SPARK_POSTGRES_PASSWORD =< GENERATE_SECURE_DB_PASSWORD >
AUTOMAGIK_SPARK_POSTGRES_PORT = 5402
# Database URL for application
AUTOMAGIK_SPARK_DATABASE_URL = postgresql+asyncpg://spark_user: < GENERATE_SECURE_DB_PASSWORD > @automagik-spark-db:5402/automagik_spark
# =================================================================
# 📬 Celery Configuration (Task Queue & Scheduling)
# =================================================================
# Redis configuration
AUTOMAGIK_SPARK_REDIS_PORT = 5412
AUTOMAGIK_SPARK_REDIS_PASSWORD =< GENERATE_SECURE_REDIS_PASSWORD >
# Celery broker and result backend
AUTOMAGIK_SPARK_CELERY_BROKER_URL = redis://: < GENERATE_SECURE_REDIS_PASSWORD > @redis:5412/0
AUTOMAGIK_SPARK_CELERY_RESULT_BACKEND = redis://: < GENERATE_SECURE_REDIS_PASSWORD > @redis:5412/0
# Worker log file
AUTOMAGIK_SPARK_WORKER_LOG = /var/log/automagik/worker.log
# Spark-Specific Encryption Key
AUTOMAGIK_SPARK_ENCRYPTION_KEY =< GENERATE_SECURE_KEY >
# =================================================================
# 🔌 Optional: Integration Settings
# =================================================================
# LangFlow Integration (if using)
# LANGFLOW_API_URL=http://your-langflow-instance:7860
# LANGFLOW_API_KEY=
# AutoMagik Hive Integration (if using)
# AUTOMAGIK_API_URL=http://your-hive-instance:8881
Generate Secure Keys
Security Critical: Never use the example keys in production. Generate strong, random keys for all security-sensitive values.
Generate secure keys using these commands:
# Generate API Key (32 characters)
openssl rand -hex 32
# Generate Encryption Key (base64 encoded, 32 bytes)
openssl rand -base64 32
# Generate Database Password (32 characters)
openssl rand -hex 32
# Generate Redis Password (32 characters)
openssl rand -hex 32
Replace <GENERATE_SECURE_KEY>, <GENERATE_SECURE_API_KEY>, <GENERATE_SECURE_DB_PASSWORD>, and <GENERATE_SECURE_REDIS_PASSWORD> in the .env file with the generated values.
Initial Setup Steps
1. Clone or Copy Files
# Create deployment directory
mkdir -p /opt/automagik-spark
cd /opt/automagik-spark
# Create docker-compose.yml and .env files
# (Copy the configurations from above)
# Edit .env file with your secure values
nano .env
# Verify configuration
cat .env | grep -v "PASSWORD\|KEY" | grep -v "^#"
3. Start Services
# Pull latest images
docker-compose pull
# Start all services
docker-compose up -d
# Verify all containers are running
docker-compose ps
Expected output:
NAME STATUS PORTS
automagik-spark-api Up 30 seconds 0.0.0.0:8883->8883/tcp
automagik-spark-beat Up 30 seconds
automagik-spark-db Up 45 seconds 0.0.0.0:5402->5402/tcp
automagik-spark-worker Up 30 seconds
redis Up 45 seconds 0.0.0.0:5412->5412/tcp
4. Run Database Migrations
# Run migrations inside the API container
docker-compose exec automagik-spark-api alembic upgrade head
5. Verify Health
# Check API health endpoint
curl http://localhost:8883/health
# Expected response:
# {"status":"healthy","api":"running","worker":"running","beat":"running","database":"connected","redis":"connected"}
6. View Logs
# Follow all logs
docker-compose logs -f
# Follow specific service logs
docker-compose logs -f automagik-spark-api
docker-compose logs -f automagik-spark-worker
docker-compose logs -f automagik-spark-beat
7. Add First Source
# Add a LangFlow source
curl -X POST http://localhost:8883/api/v1/sources \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "production-langflow",
"source_type": "langflow",
"base_url": "http://langflow:7860",
"api_key": "your-langflow-api-key"
}'
8. Test Workflow
# Sync workflows from source
curl -X POST http://localhost:8883/api/v1/workflows/sync/YOUR_SOURCE_ID \
-H "X-API-Key: YOUR_API_KEY"
# Create a schedule
curl -X POST http://localhost:8883/api/v1/schedules \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"workflow_id": "YOUR_WORKFLOW_ID",
"cron_expression": "*/5 * * * *",
"enabled": true
}'
# Verify task execution
docker-compose logs -f automagik-spark-worker
Monitoring Setup
Health Check Endpoints
Spark provides a comprehensive health check endpoint:
# Check overall health
curl http://localhost:8883/health
# Response includes:
# - API status
# - Worker status
# - Beat scheduler status
# - Database connectivity
# - Redis connectivity
Log Locations
Logs are available through Docker Compose:
# API logs
docker-compose logs automagik-spark-api
# Worker logs
docker-compose logs automagik-spark-worker
# Beat scheduler logs
docker-compose logs automagik-spark-beat
# Database logs
docker-compose logs automagik-spark-db
# Redis logs
docker-compose logs redis
Task Monitoring
Monitor active and scheduled tasks:
# List all tasks
curl http://localhost:8883/api/v1/tasks \
-H "X-API-Key: YOUR_API_KEY"
# Get task details
curl http://localhost:8883/api/v1/tasks/TASK_ID \
-H "X-API-Key: YOUR_API_KEY"
# Check worker status
docker-compose exec automagik-spark-worker celery -A automagik_spark.core.celery.celery_app inspect active
Worker Status Checks
# Check if workers are processing tasks
docker-compose exec automagik-spark-worker celery -A automagik_spark.core.celery.celery_app inspect active
# Check scheduled tasks
docker-compose exec automagik-spark-worker celery -A automagik_spark.core.celery.celery_app inspect scheduled
# Check worker statistics
docker-compose exec automagik-spark-worker celery -A automagik_spark.core.celery.celery_app inspect stats
Backup Strategy
PostgreSQL Database Backup
Create automated backups of your PostgreSQL database:
# Manual backup
docker-compose exec automagik-spark-db pg_dump -U spark_user automagik_spark > backup_ $( date +%Y%m%d_%H%M%S ) .sql
# Automated daily backup script
cat > /opt/automagik-spark/backup.sh << 'EOF'
#!/bin/bash
BACKUP_DIR="/opt/automagik-spark/backups"
DATE=$(date +%Y%m%d_%H%M%S)
mkdir -p $BACKUP_DIR
# Backup database
docker-compose exec -T automagik-spark-db pg_dump -U spark_user automagik_spark | gzip > $BACKUP_DIR/spark_db_$DATE.sql.gz
# Keep only last 7 days of backups
find $BACKUP_DIR -name "spark_db_*.sql.gz" -mtime +7 -delete
echo "Backup completed: spark_db_$DATE.sql.gz"
EOF
chmod +x /opt/automagik-spark/backup.sh
Schedule Automated Backups
Add to crontab for daily backups:
# Edit crontab
crontab -e
# Add daily backup at 2 AM
0 2 * * * /opt/automagik-spark/backup.sh >> /var/log/spark-backup.log 2>&1
Restore Procedure
Restore from a backup file:
# Stop services
docker-compose stop automagik-spark-api automagik-spark-worker automagik-spark-beat
# Restore database
gunzip < backups/spark_db_20250104_020000.sql.gz | docker-compose exec -T automagik-spark-db psql -U spark_user automagik_spark
# Start services
docker-compose start automagik-spark-api automagik-spark-worker automagik-spark-beat
# Verify health
curl http://localhost:8883/health
Scaling Workers
Spark workers can be scaled horizontally to handle increased load:
# Scale to 3 worker instances
docker-compose up -d --scale automagik-spark-worker= 3
# Verify scaled workers
docker-compose ps automagik-spark-worker
# Check worker stats
docker-compose exec automagik-spark-worker celery -A automagik_spark.core.celery.celery_app inspect stats
Never scale the Beat scheduler: The automagik-spark-beat service must always run as a single instance. Scaling it will cause duplicate task scheduling.
Automatic Scaling with Resource Limits
Add resource limits to your docker-compose.yml:
automagik-spark-worker :
# ... existing configuration ...
deploy :
resources :
limits :
cpus : '1.0'
memory : 1G
reservations :
cpus : '0.5'
memory : 512M
restart : unless-stopped
Security Checklist
Before going to production, verify these security measures:
Update Procedure
Update Spark to a new version without downtime:
Rolling Update (Recommended)
# Pull latest images
docker-compose pull
# Update API first (brief downtime)
docker-compose up -d automagik-spark-api
# Wait for API to be healthy
sleep 10
curl http://localhost:8883/health
# Update workers one by one (zero downtime)
docker-compose up -d --no-deps --scale automagik-spark-worker= 2 automagik-spark-worker
sleep 5
docker-compose up -d --no-deps --scale automagik-spark-worker= 1 automagik-spark-worker
# Update beat scheduler (brief schedule processing delay)
docker-compose up -d automagik-spark-beat
# Verify all services
docker-compose ps
curl http://localhost:8883/health
Full Update (With Brief Downtime)
# Pull latest images
docker-compose pull
# Stop all services
docker-compose down
# Start with new images
docker-compose up -d
# Run any pending migrations
docker-compose exec automagik-spark-api alembic upgrade head
# Verify health
curl http://localhost:8883/health
Common Production Issues
Out of Memory
Symptoms: Worker containers restarting, tasks failing with memory errors
Solution:
# Check memory usage
docker stats
# Increase worker memory limits
# Edit docker-compose.yml:
automagik-spark-worker:
deploy:
resources:
limits:
memory: 2G # Increase from 1G
Database Connection Pool Exhaustion
Symptoms: “Too many connections” errors in API logs
Solution:
# Increase PostgreSQL max connections
# Edit docker-compose.yml:
automagik-spark-db:
command : postgres -p 5402 -c max_connections= 200
# Restart database
docker-compose restart automagik-spark-db
Redis Memory Limits
Symptoms: Redis evicting keys, tasks not being processed
Solution:
# Set Redis max memory policy
# Edit docker-compose.yml:
redis:
command : redis-server --port 5412 --appendonly yes --requirepass "${ REDIS_PASSWORD }" --maxmemory 2gb --maxmemory-policy allkeys-lru
# Restart Redis
docker-compose restart redis
Worker Crashes
Symptoms: Tasks stuck in “pending” state, worker containers restarting
Diagnosis:
# Check worker logs
docker-compose logs --tail=100 automagik-spark-worker
# Check for resource issues
docker stats automagik-spark-worker
# Check Celery worker health
docker-compose exec automagik-spark-worker celery -A automagik_spark.core.celery.celery_app inspect ping
Solution:
Increase worker memory limits
Reduce worker concurrency
Check for task timeout issues
Review task code for errors
Beat Scheduler Not Running
Symptoms: Schedules not firing, no tasks being created
Diagnosis:
# Check beat scheduler logs
docker-compose logs automagik-spark-beat
# Verify beat is running
docker-compose ps automagik-spark-beat
Solution:
# Restart beat scheduler
docker-compose restart automagik-spark-beat
# Check schedule file permissions
docker-compose exec automagik-spark-beat ls -la /tmp/celerybeat-schedule
Database Migration Failures
Symptoms: API container failing to start, migration errors in logs
Solution:
# Check migration status
docker-compose exec automagik-spark-api alembic current
# View pending migrations
docker-compose exec automagik-spark-api alembic history
# Manually run migrations
docker-compose exec automagik-spark-api alembic upgrade head
# If migration fails, check database logs
docker-compose logs automagik-spark-db
Advanced Configuration
HTTPS with Nginx Reverse Proxy
Configure Nginx as a reverse proxy with SSL: server {
listen 443 ssl http2;
server_name yourdomain.com;
ssl_certificate /etc/letsencrypt/live/yourdomain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/yourdomain.com/privkey.pem;
location / {
proxy_pass http://localhost:8883;
proxy_set_header Host $ host ;
proxy_set_header X-Real-IP $ remote_addr ;
proxy_set_header X-Forwarded-For $ proxy_add_x_forwarded_for ;
proxy_set_header X-Forwarded-Proto $ scheme ;
}
}
server {
listen 80 ;
server_name yourdomain.com;
return 301 https://$ server_name $ request_uri ;
}
Docker Compose with External Services
Use external PostgreSQL and Redis instances: services :
automagik-spark-api :
image : namastexlabs/automagik_spark-spark-api:latest
environment :
AUTOMAGIK_SPARK_DATABASE_URL : postgresql+asyncpg://user:pass@external-db:5432/spark
AUTOMAGIK_SPARK_CELERY_BROKER_URL : redis://:pass@external-redis:6379/0
AUTOMAGIK_SPARK_CELERY_RESULT_BACKEND : redis://:pass@external-redis:6379/0
# Remove depends_on for external services
For distributed deployments across regions:
Primary Region: API + Beat + Workers + Database
Secondary Regions: Workers only (connect to primary database/redis)
Secondary region configuration: services :
automagik-spark-worker :
image : namastexlabs/automagik_spark-spark-worker:latest
environment :
AUTOMAGIK_SPARK_DATABASE_URL : postgresql+asyncpg://user:pass@primary-db:5432/spark
AUTOMAGIK_SPARK_CELERY_BROKER_URL : redis://:pass@primary-redis:6379/0
Monitoring with Prometheus
Add Celery Prometheus exporter: services :
celery-exporter :
image : danihodovic/celery-exporter
environment :
CELERY_BROKER_URL : redis://:${REDIS_PASSWORD}@redis:5412/0
ports :
- "9540:9540"
Prometheus configuration: scrape_configs :
- job_name : 'celery'
static_configs :
- targets : [ 'celery-exporter:9540' ]
Production Best Practices
Use Specific Image Tags: Pin to specific versions instead of :latest
image : namastexlabs/automagik_spark-spark-api:v0.3.8
Set Resource Limits: Always define memory and CPU limits
Enable Monitoring: Set up health checks and alerting
Regular Backups: Automate database backups and test restores
Log Rotation: Configure log rotation to prevent disk space issues
docker-compose logs --tail=1000 > /dev/null # Truncate old logs
Security Updates: Regularly update Docker images and dependencies
Environment Segregation: Use separate .env files for staging and production
Network Isolation: Use Docker networks to isolate services
Secrets Management: Consider using Docker secrets or external vaults
Disaster Recovery Plan: Document and test recovery procedures
Next Steps