Overview
Spark’s scheduling system is built on Celery Beat, which continuously monitors schedules stored in PostgreSQL and triggers task execution when schedules are due. This document explains the complete flow from schedule creation to task execution.Key Components
Database Scheduler
The heart of Spark’s scheduling system is theDatabaseScheduler class (source), which extends Celery’s base Scheduler class to load schedules directly from the PostgreSQL database instead of using a file-based schedule.
Why database-backed scheduling?
- Dynamic schedules can be added/updated without restarting the beat process
- Multiple beat instances can read from the same database (high availability)
- Schedule state is persistent across restarts
- Integration with the rest of Spark’s data model
Schedule Monitoring Loop
Celery Beat runs a continuous loop that checks schedules every 5 seconds (configured viabeat_max_loop_interval):
The Tick Method
Thetick() method in DatabaseScheduler is called by Celery Beat every 5 seconds:
- Refreshes schedules from the database
- Delegates to parent Celery’s
Scheduler.tick()to evaluate schedules - Returns control back to the beat process
Schedule Types Compared
Spark supports three types of schedules, each with different behavior:| Type | Expression Format | Example | Next Run Calculation | Use Case |
|---|---|---|---|---|
| interval | {value}{unit} where unit is m, h, or d | 30m, 2h, 1d | Current time + interval | Regular recurring tasks |
| cron | Standard 5-part cron expression | 0 9 * * 1-5 (9am weekdays) | Next cron match from current time | Complex schedules with specific timing |
| one-time | ISO datetime or now | 2024-12-25T09:00:00Z or now | Scheduled time (or immediate) | One-off task execution |
Interval Schedules
Interval schedules are parsed and converted to seconds:Cron Schedules
Cron expressions use the standard 5-part format:0 9 * * *- Daily at 9:00 AM UTC0 */2 * * *- Every 2 hours0 9 * * 1-5- Weekdays at 9:00 AM*/15 * * * *- Every 15 minutes
One-Time Schedules
One-time schedules execute once at a specific time:status="active" but will not trigger again since their scheduled time has passed.
Cron Expression Parsing
Spark uses the croniter library to validate and parse cron expressions. The validation happens at schedule creation time:Common Cron Patterns
Timezone Handling
All schedules in Spark are stored and executed in UTC. This is critical for consistent behavior across deployments.Timezone Conversion Flow
Configuration
The Celery timezone is configured incelery_config.py:
AUTOMAGIK_TIMEZONE=UTC unless you have specific requirements. If you need schedules in local time, calculate the UTC equivalent when creating schedules.
Task Creation from Schedules
When a schedule fires, Celery Beat doesn’t directly execute the workflow. Instead, it creates a Celery task that gets queued in Redis for workers to process.What Happens When a Schedule Fires
- Beat detects due schedule (during tick loop)
- Creates ScheduleEntry with task details:
- Celery sends task to Redis queue
- Worker picks up task from queue (covered in Task Execution)
Schedule-to-Task Mapping
The relationship between schedules and tasks:- Reads the schedule from the database using
schedule_id - Creates a
Taskdatabase record - Executes the workflow associated with the schedule
- Updates the
Taskrecord with results - Calculates and updates the schedule’s
next_run_at
Complete Schedule Fire Flow
Here’s the complete flow from schedule detection to task queuing:Retry Policy
When a task is queued from a schedule, it includes a retry policy:- Attempt 1: Execute immediately
- Attempt 2 (if failed): Wait 0.2 seconds, retry
- Attempt 3 (if failed): Wait 0.2 seconds, retry
- Attempt 4 (if failed): Wait 0.2 seconds, retry
- After 3 retries: Task fails permanently
Schedule Updates and Notifications
When you update a schedule through the API or CLI, Spark notifies the beat process to reload schedules immediately instead of waiting for the next tick:Performance Considerations
Database Queries
Every 5 seconds, Celery Beat queries:FOR UPDATE lock ensures:
- Only one beat instance processes each schedule
- No race conditions with simultaneous updates
- Consistent read/write behavior
Schedule Limits
The 5-second tick interval means:- Minimum practical interval: 5 seconds (though
1mis recommended minimum) - Schedule evaluation overhead: ~10-50ms for typical deployments
- Maximum schedules: Tested with 1000+ active schedules without issues
High Availability
You can run multiple Celery Beat instances, but:- Only one instance should be active at a time
- Use a leader election mechanism (not built into Spark)
- Alternative: Run single beat instance with monitoring/restart
Debugging Schedules
Check Schedule Status
View active schedules when beat starts:Schedule Not Firing?
Common issues:-
Schedule status is not “active”
-
next_run_at is in the past
- Beat only fires schedules where
next_run_at <= now - Update the schedule to recalculate
next_run_at
- Beat only fires schedules where
-
Beat process not running
-
Invalid cron expression
- Beat logs will show:
Invalid cron expression for schedule {id} - Validate your cron expression: https://crontab.guru/
- Beat logs will show:
Source Code References
- DatabaseScheduler:
automagik_spark/core/celery/scheduler.py - SchedulerManager:
automagik_spark/core/scheduler/manager.py - Schedule validation:
automagik_spark/core/scheduler/utils.py
Next Steps
- Learn about Task Execution to understand what happens after a schedule fires
- Explore Adapter System to see how workflows are executed
- See Scaling Production for running multiple workers

