Production Deployment - Automagik Suite Documentation

Prerequisites

Before deploying Spark to production, ensure you have:

Server Requirements
- Linux server (Ubuntu 20.04+ or Debian 11+ recommended)
- Minimum 2GB RAM (4GB+ recommended for production workloads)
- 20GB available disk space
- Docker 20.10+ and Docker Compose 2.0+
Network Requirements
- Port 8883 available for Spark API
- Internal ports for PostgreSQL (5402) and Redis (5412)
- Domain name (optional, for SSL/HTTPS access)
Optional
- SSL certificates (Let’s Encrypt recommended)
- Nginx or Traefik for reverse proxy
- Monitoring tools (Prometheus, Grafana)

Docker Compose Setup

Complete Production Configuration

Create a docker-compose.yml file with the following production-ready configuration:

docker-compose.yml

services:
  # Redis for Celery task queue
  redis:
    image: redis:7.4.2-alpine
    container_name: automagik-spark-redis
    ports:
      - "${AUTOMAGIK_SPARK_REDIS_PORT:-5412}:${AUTOMAGIK_SPARK_REDIS_PORT:-5412}"
    command: redis-server --port ${AUTOMAGIK_SPARK_REDIS_PORT:-5412} --appendonly yes --requirepass "${AUTOMAGIK_SPARK_REDIS_PASSWORD:-spark_redis_pass}"
    volumes:
      - redis-data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "-p", "${AUTOMAGIK_SPARK_REDIS_PORT:-5412}", "-a", "${AUTOMAGIK_SPARK_REDIS_PASSWORD:-spark_redis_pass}", "ping"]
      interval: 5s
      timeout: 5s
      retries: 5
    restart: unless-stopped
    networks:
      - automagik-spark-network

  # PostgreSQL database
  automagik-spark-db:
    image: postgres:15
    container_name: automagik-spark-db
    environment:
      POSTGRES_USER: ${AUTOMAGIK_SPARK_POSTGRES_USER:-spark_user}
      POSTGRES_PASSWORD: ${AUTOMAGIK_SPARK_POSTGRES_PASSWORD:-spark_pass}
      POSTGRES_DB: automagik_spark
    volumes:
      - automagik-spark-db-data:/var/lib/postgresql/data
    command: postgres -p ${AUTOMAGIK_SPARK_POSTGRES_PORT:-5402}
    ports:
      - "${AUTOMAGIK_SPARK_POSTGRES_PORT:-5402}:${AUTOMAGIK_SPARK_POSTGRES_PORT:-5402}"
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${AUTOMAGIK_SPARK_POSTGRES_USER:-spark_user} -p ${AUTOMAGIK_SPARK_POSTGRES_PORT:-5402}"]
      interval: 5s
      timeout: 5s
      retries: 5
    restart: unless-stopped
    networks:
      - automagik-spark-network

  # Spark API service
  automagik-spark-api:
    image: namastexlabs/automagik_spark-spark-api:latest
    container_name: automagik-spark-api
    env_file:
      - .env
    environment:
      ENVIRONMENT: production
      PYTHONUNBUFFERED: 1
    ports:
      - "8883:8883"
    depends_on:
      automagik-spark-db:
        condition: service_healthy
      redis:
        condition: service_healthy
    healthcheck:
      test: ["CMD-SHELL", "curl -f http://localhost:8883/health || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 30s
    restart: unless-stopped
    networks:
      - automagik-spark-network

  # Spark Worker service (scales horizontally)
  automagik-spark-worker:
    image: namastexlabs/automagik_spark-spark-worker:latest
    container_name: automagik-spark-worker
    env_file:
      - .env
    environment:
      ENVIRONMENT: production
      PYTHONUNBUFFERED: 1
    depends_on:
      automagik-spark-db:
        condition: service_healthy
      automagik-spark-api:
        condition: service_healthy
      redis:
        condition: service_healthy
    restart: unless-stopped
    networks:
      - automagik-spark-network

  # Celery Beat scheduler (singleton - do not scale)
  automagik-spark-beat:
    image: namastexlabs/automagik_spark-spark-worker:latest
    container_name: automagik-spark-beat
    env_file:
      - .env
    environment:
      ENVIRONMENT: production
      PYTHONUNBUFFERED: 1
    command: celery -A automagik_spark.core.celery.celery_app beat --loglevel=INFO --schedule=/tmp/celerybeat-schedule --max-interval=1
    depends_on:
      automagik-spark-db:
        condition: service_healthy
      redis:
        condition: service_healthy
    restart: unless-stopped
    networks:
      - automagik-spark-network

volumes:
  automagik-spark-db-data:
  redis-data:

networks:
  automagik-spark-network:
    name: automagik-spark-network

Important: The Celery Beat scheduler (automagik-spark-beat) must run as a singleton. Never scale this service to multiple instances, as it will cause duplicate task scheduling.

Environment Configuration

Production Environment Variables

Create a .env file in the same directory as your docker-compose.yml:

.env

# =================================================================
# 🌍 Global Environment Configuration
# =================================================================
ENVIRONMENT=production

# Global Logging Configuration
LOG_LEVEL=INFO
LOG_FOLDER=./logs

# Global Timezone (Critical for scheduling)
AUTOMAGIK_TIMEZONE=UTC

# Global Encryption and Security
AUTOMAGIK_ENCRYPTION_KEY=<GENERATE_SECURE_KEY>

# =================================================================
# 🔧 Core Application Settings (REQUIRED)
# =================================================================
AUTOMAGIK_SPARK_API_KEY=<GENERATE_SECURE_API_KEY>
AUTOMAGIK_SPARK_API_HOST=0.0.0.0
AUTOMAGIK_SPARK_API_PORT=8883

# CORS origins (comma-separated list)
AUTOMAGIK_SPARK_API_CORS=https://yourdomain.com,https://app.yourdomain.com

# Public URL where Spark API is accessible
AUTOMAGIK_SPARK_REMOTE_URL=https://yourdomain.com

# =================================================================
# 🗄️ Database Configuration
# =================================================================
# PostgreSQL connection for production
AUTOMAGIK_SPARK_POSTGRES_USER=spark_user
AUTOMAGIK_SPARK_POSTGRES_PASSWORD=<GENERATE_SECURE_DB_PASSWORD>
AUTOMAGIK_SPARK_POSTGRES_PORT=5402

# Database URL for application
AUTOMAGIK_SPARK_DATABASE_URL=postgresql+asyncpg://spark_user:<GENERATE_SECURE_DB_PASSWORD>@automagik-spark-db:5402/automagik_spark

# =================================================================
# 📬 Celery Configuration (Task Queue & Scheduling)
# =================================================================
# Redis configuration
AUTOMAGIK_SPARK_REDIS_PORT=5412
AUTOMAGIK_SPARK_REDIS_PASSWORD=<GENERATE_SECURE_REDIS_PASSWORD>

# Celery broker and result backend
AUTOMAGIK_SPARK_CELERY_BROKER_URL=redis://:< GENERATE_SECURE_REDIS_PASSWORD>@redis:5412/0
AUTOMAGIK_SPARK_CELERY_RESULT_BACKEND=redis://:<GENERATE_SECURE_REDIS_PASSWORD>@redis:5412/0

# Worker log file
AUTOMAGIK_SPARK_WORKER_LOG=/var/log/automagik/worker.log

# Spark-Specific Encryption Key
AUTOMAGIK_SPARK_ENCRYPTION_KEY=<GENERATE_SECURE_KEY>

# =================================================================
# 🔌 Optional: Integration Settings
# =================================================================
# LangFlow Integration (if using)
# LANGFLOW_API_URL=http://your-langflow-instance:7860
# LANGFLOW_API_KEY=

# AutoMagik Hive Integration (if using)
# AUTOMAGIK_API_URL=http://your-hive-instance:8881

Generate Secure Keys

Security Critical: Never use the example keys in production. Generate strong, random keys for all security-sensitive values.

Generate secure keys using these commands:

# Generate API Key (32 characters)
openssl rand -hex 32

# Generate Encryption Key (base64 encoded, 32 bytes)
openssl rand -base64 32

# Generate Database Password (32 characters)
openssl rand -hex 32

# Generate Redis Password (32 characters)
openssl rand -hex 32

Replace <GENERATE_SECURE_KEY>, <GENERATE_SECURE_API_KEY>, <GENERATE_SECURE_DB_PASSWORD>, and <GENERATE_SECURE_REDIS_PASSWORD> in the .env file with the generated values.

Initial Setup Steps

1. Clone or Copy Files

# Create deployment directory
mkdir -p /opt/automagik-spark
cd /opt/automagik-spark

# Create docker-compose.yml and .env files
# (Copy the configurations from above)

2. Configure Environment

# Edit .env file with your secure values
nano .env

# Verify configuration
cat .env | grep -v "PASSWORD\|KEY" | grep -v "^#"

3. Start Services

# Pull latest images
docker-compose pull

# Start all services
docker-compose up -d

# Verify all containers are running
docker-compose ps

Expected output:

NAME                      STATUS              PORTS
automagik-spark-api       Up 30 seconds       0.0.0.0:8883->8883/tcp
automagik-spark-beat      Up 30 seconds
automagik-spark-db        Up 45 seconds       0.0.0.0:5402->5402/tcp
automagik-spark-worker    Up 30 seconds
redis                     Up 45 seconds       0.0.0.0:5412->5412/tcp

4. Run Database Migrations

# Run migrations inside the API container
docker-compose exec automagik-spark-api alembic upgrade head

5. Verify Health

# Check API health endpoint
curl http://localhost:8883/health

# Expected response:
# {"status":"healthy","api":"running","worker":"running","beat":"running","database":"connected","redis":"connected"}

6. View Logs

# Follow all logs
docker-compose logs -f

# Follow specific service logs
docker-compose logs -f automagik-spark-api
docker-compose logs -f automagik-spark-worker
docker-compose logs -f automagik-spark-beat

7. Add First Source

# Add a LangFlow source
curl -X POST http://localhost:8883/api/v1/sources \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "production-langflow",
    "source_type": "langflow",
    "base_url": "http://langflow:7860",
    "api_key": "your-langflow-api-key"
  }'

8. Test Workflow

# Sync workflows from source
curl -X POST http://localhost:8883/api/v1/workflows/sync/YOUR_SOURCE_ID \
  -H "X-API-Key: YOUR_API_KEY"

# Create a schedule
curl -X POST http://localhost:8883/api/v1/schedules \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "workflow_id": "YOUR_WORKFLOW_ID",
    "cron_expression": "*/5 * * * *",
    "enabled": true
  }'

# Verify task execution
docker-compose logs -f automagik-spark-worker

Monitoring Setup

Health Check Endpoints

Spark provides a comprehensive health check endpoint:

# Check overall health
curl http://localhost:8883/health

# Response includes:
# - API status
# - Worker status
# - Beat scheduler status
# - Database connectivity
# - Redis connectivity

Log Locations

Logs are available through Docker Compose:

# API logs
docker-compose logs automagik-spark-api

# Worker logs
docker-compose logs automagik-spark-worker

# Beat scheduler logs
docker-compose logs automagik-spark-beat

# Database logs
docker-compose logs automagik-spark-db

# Redis logs
docker-compose logs redis

Task Monitoring

Monitor active and scheduled tasks:

# List all tasks
curl http://localhost:8883/api/v1/tasks \
  -H "X-API-Key: YOUR_API_KEY"

# Get task details
curl http://localhost:8883/api/v1/tasks/TASK_ID \
  -H "X-API-Key: YOUR_API_KEY"

# Check worker status
docker-compose exec automagik-spark-worker celery -A automagik_spark.core.celery.celery_app inspect active

Worker Status Checks

# Check if workers are processing tasks
docker-compose exec automagik-spark-worker celery -A automagik_spark.core.celery.celery_app inspect active

# Check scheduled tasks
docker-compose exec automagik-spark-worker celery -A automagik_spark.core.celery.celery_app inspect scheduled

# Check worker statistics
docker-compose exec automagik-spark-worker celery -A automagik_spark.core.celery.celery_app inspect stats

Backup Strategy

PostgreSQL Database Backup

Create automated backups of your PostgreSQL database:

# Manual backup
docker-compose exec automagik-spark-db pg_dump -U spark_user automagik_spark > backup_$(date +%Y%m%d_%H%M%S).sql

# Automated daily backup script
cat > /opt/automagik-spark/backup.sh << 'EOF'
#!/bin/bash
BACKUP_DIR="/opt/automagik-spark/backups"
DATE=$(date +%Y%m%d_%H%M%S)
mkdir -p $BACKUP_DIR

# Backup database
docker-compose exec -T automagik-spark-db pg_dump -U spark_user automagik_spark | gzip > $BACKUP_DIR/spark_db_$DATE.sql.gz

# Keep only last 7 days of backups
find $BACKUP_DIR -name "spark_db_*.sql.gz" -mtime +7 -delete

echo "Backup completed: spark_db_$DATE.sql.gz"
EOF

chmod +x /opt/automagik-spark/backup.sh

Schedule Automated Backups

Add to crontab for daily backups:

# Edit crontab
crontab -e

# Add daily backup at 2 AM
0 2 * * * /opt/automagik-spark/backup.sh >> /var/log/spark-backup.log 2>&1

Restore Procedure

Restore from a backup file:

# Stop services
docker-compose stop automagik-spark-api automagik-spark-worker automagik-spark-beat

# Restore database
gunzip < backups/spark_db_20250104_020000.sql.gz | docker-compose exec -T automagik-spark-db psql -U spark_user automagik_spark

# Start services
docker-compose start automagik-spark-api automagik-spark-worker automagik-spark-beat

# Verify health
curl http://localhost:8883/health

Scaling Workers

Spark workers can be scaled horizontally to handle increased load:

# Scale to 3 worker instances
docker-compose up -d --scale automagik-spark-worker=3

# Verify scaled workers
docker-compose ps automagik-spark-worker

# Check worker stats
docker-compose exec automagik-spark-worker celery -A automagik_spark.core.celery.celery_app inspect stats

Never scale the Beat scheduler: The automagik-spark-beat service must always run as a single instance. Scaling it will cause duplicate task scheduling.

Automatic Scaling with Resource Limits

Add resource limits to your docker-compose.yml:

automagik-spark-worker:
  # ... existing configuration ...
  deploy:
    resources:
      limits:
        cpus: '1.0'
        memory: 1G
      reservations:
        cpus: '0.5'
        memory: 512M
  restart: unless-stopped

Security Checklist

Before going to production, verify these security measures:

Update Procedure

Update Spark to a new version without downtime:

Rolling Update (Recommended)

# Pull latest images
docker-compose pull

# Update API first (brief downtime)
docker-compose up -d automagik-spark-api

# Wait for API to be healthy
sleep 10
curl http://localhost:8883/health

# Update workers one by one (zero downtime)
docker-compose up -d --no-deps --scale automagik-spark-worker=2 automagik-spark-worker
sleep 5
docker-compose up -d --no-deps --scale automagik-spark-worker=1 automagik-spark-worker

# Update beat scheduler (brief schedule processing delay)
docker-compose up -d automagik-spark-beat

# Verify all services
docker-compose ps
curl http://localhost:8883/health

Full Update (With Brief Downtime)

# Pull latest images
docker-compose pull

# Stop all services
docker-compose down

# Start with new images
docker-compose up -d

# Run any pending migrations
docker-compose exec automagik-spark-api alembic upgrade head

# Verify health
curl http://localhost:8883/health

Common Production Issues

Out of Memory

Symptoms: Worker containers restarting, tasks failing with memory errors Solution:

# Check memory usage
docker stats

# Increase worker memory limits
# Edit docker-compose.yml:
automagik-spark-worker:
  deploy:
    resources:
      limits:
        memory: 2G  # Increase from 1G

Database Connection Pool Exhaustion

Symptoms: “Too many connections” errors in API logs Solution:

# Increase PostgreSQL max connections
# Edit docker-compose.yml:
automagik-spark-db:
  command: postgres -p 5402 -c max_connections=200

# Restart database
docker-compose restart automagik-spark-db

Redis Memory Limits

Symptoms: Redis evicting keys, tasks not being processed Solution:

# Set Redis max memory policy
# Edit docker-compose.yml:
redis:
  command: redis-server --port 5412 --appendonly yes --requirepass "${REDIS_PASSWORD}" --maxmemory 2gb --maxmemory-policy allkeys-lru

# Restart Redis
docker-compose restart redis

Worker Crashes

Symptoms: Tasks stuck in “pending” state, worker containers restarting Diagnosis:

# Check worker logs
docker-compose logs --tail=100 automagik-spark-worker

# Check for resource issues
docker stats automagik-spark-worker

# Check Celery worker health
docker-compose exec automagik-spark-worker celery -A automagik_spark.core.celery.celery_app inspect ping

Solution:

Increase worker memory limits
Reduce worker concurrency
Check for task timeout issues
Review task code for errors

Beat Scheduler Not Running

Symptoms: Schedules not firing, no tasks being created Diagnosis:

# Check beat scheduler logs
docker-compose logs automagik-spark-beat

# Verify beat is running
docker-compose ps automagik-spark-beat

Solution:

# Restart beat scheduler
docker-compose restart automagik-spark-beat

# Check schedule file permissions
docker-compose exec automagik-spark-beat ls -la /tmp/celerybeat-schedule

Database Migration Failures

Symptoms: API container failing to start, migration errors in logs Solution:

# Check migration status
docker-compose exec automagik-spark-api alembic current

# View pending migrations
docker-compose exec automagik-spark-api alembic history

# Manually run migrations
docker-compose exec automagik-spark-api alembic upgrade head

# If migration fails, check database logs
docker-compose logs automagik-spark-db

Advanced Configuration

HTTPS with Nginx Reverse Proxy

Configure Nginx as a reverse proxy with SSL:

server {
    listen 443 ssl http2;
    server_name yourdomain.com;

    ssl_certificate /etc/letsencrypt/live/yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/yourdomain.com/privkey.pem;

    location / {
        proxy_pass http://localhost:8883;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

server {
    listen 80;
    server_name yourdomain.com;
    return 301 https://$server_name$request_uri;
}

Docker Compose with External Services

Use external PostgreSQL and Redis instances:

services:
  automagik-spark-api:
    image: namastexlabs/automagik_spark-spark-api:latest
    environment:
      AUTOMAGIK_SPARK_DATABASE_URL: postgresql+asyncpg://user:pass@external-db:5432/spark
      AUTOMAGIK_SPARK_CELERY_BROKER_URL: redis://:pass@external-redis:6379/0
      AUTOMAGIK_SPARK_CELERY_RESULT_BACKEND: redis://:pass@external-redis:6379/0
    # Remove depends_on for external services

Multi-Region Deployment

For distributed deployments across regions:

Primary Region: API + Beat + Workers + Database
Secondary Regions: Workers only (connect to primary database/redis)

Secondary region configuration:

services:
  automagik-spark-worker:
    image: namastexlabs/automagik_spark-spark-worker:latest
    environment:
      AUTOMAGIK_SPARK_DATABASE_URL: postgresql+asyncpg://user:pass@primary-db:5432/spark
      AUTOMAGIK_SPARK_CELERY_BROKER_URL: redis://:pass@primary-redis:6379/0

Monitoring with Prometheus

Add Celery Prometheus exporter:

services:
  celery-exporter:
    image: danihodovic/celery-exporter
    environment:
      CELERY_BROKER_URL: redis://:${REDIS_PASSWORD}@redis:5412/0
    ports:
      - "9540:9540"

Prometheus configuration:

scrape_configs:
  - job_name: 'celery'
    static_configs:
      - targets: ['celery-exporter:9540']

Production Best Practices

Use Specific Image Tags: Pin to specific versions instead of :latest
```
image: namastexlabs/automagik_spark-spark-api:v0.3.8
```
Set Resource Limits: Always define memory and CPU limits
Enable Monitoring: Set up health checks and alerting
Regular Backups: Automate database backups and test restores

Log Rotation: Configure log rotation to prevent disk space issues

docker-compose logs --tail=1000 > /dev/null  # Truncate old logs

Security Updates: Regularly update Docker images and dependencies
Environment Segregation: Use separate .env files for staging and production
Network Isolation: Use Docker networks to isolate services
Secrets Management: Consider using Docker secrets or external vaults
Disaster Recovery Plan: Document and test recovery procedures

Next Steps

Configure Schedules - Set up automated workflow execution
Monitor Tasks - Track task execution and troubleshoot issues
Scale Workers - Advanced scaling strategies
Custom Adapters - Integrate with custom workflow engines

Getting Started

Examples

API Reference

CLI Reference

Configuration

Core Concepts

Advanced

Troubleshooting

​Prerequisites

​Docker Compose Setup

​Complete Production Configuration

​Environment Configuration

​Production Environment Variables

​Generate Secure Keys

​Initial Setup Steps

​1. Clone or Copy Files

​2. Configure Environment

​3. Start Services

​4. Run Database Migrations

​5. Verify Health

​6. View Logs

​7. Add First Source

​8. Test Workflow

​Monitoring Setup

​Health Check Endpoints

​Log Locations

​Task Monitoring

​Worker Status Checks

​Backup Strategy

​PostgreSQL Database Backup

​Schedule Automated Backups

​Restore Procedure

​Scaling Workers

​Automatic Scaling with Resource Limits

​Security Checklist

​Update Procedure

​Rolling Update (Recommended)

​Full Update (With Brief Downtime)

​Common Production Issues

​Out of Memory

​Database Connection Pool Exhaustion

​Redis Memory Limits

​Worker Crashes

​Beat Scheduler Not Running

​Database Migration Failures

​Advanced Configuration

​Production Best Practices

​Next Steps

Prerequisites

Docker Compose Setup

Complete Production Configuration

Environment Configuration

Production Environment Variables

Generate Secure Keys

Initial Setup Steps

1. Clone or Copy Files

2. Configure Environment

3. Start Services

4. Run Database Migrations

5. Verify Health

6. View Logs

7. Add First Source

8. Test Workflow

Monitoring Setup

Health Check Endpoints

Log Locations

Task Monitoring

Worker Status Checks

Backup Strategy

PostgreSQL Database Backup

Schedule Automated Backups

Restore Procedure

Scaling Workers

Automatic Scaling with Resource Limits

Security Checklist

Update Procedure

Rolling Update (Recommended)

Full Update (With Brief Downtime)

Common Production Issues

Out of Memory

Database Connection Pool Exhaustion

Redis Memory Limits

Worker Crashes

Beat Scheduler Not Running

Database Migration Failures

Advanced Configuration

Production Best Practices

Next Steps