Sagas
Sagas orchestrate long-running workflows across multiple aggregates. Unlike implications (which react automatically), sagas are explicit: a trigger event starts them, they execute steps in order, and they handle failure with compensation.
When to use sagas
Good for: - Multi-step business processes (checkout, onboarding, provisioning) - Operations spanning external services (payment APIs, email, webhooks) - Processes needing human approval - Anything where partial failure requires rollback
Not for: - Simple single-aggregate updates (use regular events) - Automatic reactions to events (use implications) - Single delayed actions (use scheduled events)
You might not need a saga
If it's a single delayed action -- send a reminder in 24 hours, expire a token next week -- use a scheduled event. Sagas are for multi-step workflows where steps depend on each other and failures need to unwind previous work.
Defining sagas
Sagas are defined in modules.sagas in your spec:
{
"modules": {
"sagas": {
"checkout": {
"trigger": {
"aggregate_type": "cart",
"event_type": "checkout_started"
},
"steps": [
{
"name": "reserve_inventory",
"emit": {
"aggregate_type": "inventory",
"id": "$.data.product_id",
"event_type": "reservation_requested",
"data": {
"quantity": "$.data.quantity"
}
},
"await": {
"aggregate_type": "inventory",
"event_types": ["was_reserved", "reservation_failed"]
},
"compensate": {
"aggregate_type": "inventory",
"id": "$.data.product_id",
"event_type": "reservation_released",
"data": {}
},
"timeout_ms": 30000
},
{
"name": "charge_payment",
"condition": {"equals": ["$prev.type", "was_reserved"]},
"emit": {
"aggregate_type": "payment",
"id": "$.data.payment_id",
"event_type": "charge_requested",
"data": {"amount": "$.data.total"}
},
"await": {
"aggregate_type": "payment",
"event_types": ["was_charged", "charge_failed"]
},
"compensate": {
"aggregate_type": "payment",
"id": "$.data.payment_id",
"event_type": "refund_requested",
"data": {}
}
},
{
"name": "notify",
"condition": {"equals": ["$prev.type", "was_charged"]},
"emit": {
"aggregate_type": "notification",
"id": "$.data.customer_id",
"event_type": "order_confirmation_sent",
"data": {
"transaction_id": "$prev.data.transaction_id"
}
}
}
],
"on_complete": {
"aggregate_type": "order",
"id": "$.data.order_id",
"event_type": "checkout_completed",
"data": {}
},
"on_failed": {
"aggregate_type": "order",
"id": "$.data.order_id",
"event_type": "checkout_failed",
"data": {"error": "$error.message"}
}
}
}
}
}
Trigger
"trigger": {
"aggregate_type": "cart",
"event_type": "checkout_started"
}
When checkout_started is written to any cart, this saga starts.
Steps
Each step has a name, an event to emit, and optionally:
- await - event types to wait for (step completes when one arrives)
- compensate - event to emit if a later step fails
- timeout_ms - how long to wait (required for steps with await)
- condition - only run if true
Referencing data
Templates pull data from the trigger event and previous steps:
| Template | Description |
|---|---|
$.key |
Trigger aggregate key (cart:abc123) |
$.id |
Trigger aggregate ID (abc123) |
$.type |
Trigger event type |
$.data.* |
Trigger event data |
$prev.type |
Previous step's response type |
$prev.data.* |
Previous step's response data |
$context.<step>.* |
Earlier step's result by name |
$error.message |
Error message (in on_failed) |
$error.step |
Failed step name (in on_failed) |
The condition on charge_payment checks that inventory was actually reserved before charging. If the previous step returned reservation_failed, this step is skipped.
How sagas execute
- Trigger event is written, saga starts
- Execute step 1: Emit
inventory.reservation_requested - Wait: For a matching
awaitevent or timeout - On success: Proceed to step 2
- On failure: Run compensations in reverse order, mark saga failed
Each step emits its event, then waits for a response matching one of the await.event_types. If no response arrives within timeout_ms, the step fails.
Steps without await
Steps without await succeed immediately after emitting. Use this for fire-and-forget actions like logging:
{
"name": "audit_log",
"emit": {
"aggregate_type": "audit",
"id": "global",
"event_type": "checkout_attempted",
"data": {"cart_id": "$.key"}
}
}
Compensation
When a step fails, the saga walks backward through completed steps and emits their compensate events:
Step 3 fails
-> Compensate step 2
-> Compensate step 1
-> Saga marked failed
Steps without compensate are skipped during rollback (like notify -- nothing to undo).
If compensation itself fails after 3 attempts (max_compensation_attempts), the saga is dead-lettered for manual review.
Lifecycle events
on_complete fires when all steps succeed. on_failed fires after compensation completes. Both are optional.
Saga state
Query saga status via the admin API:
GET /_admin/sagas/:id
Authorization: Bearer <operator-jwt>
Response:
{
"saga_id": "saga_abc123",
"type": "checkout",
"status": "running",
"current_step": "charge_payment",
"started_at": 1705312800,
"steps": [
{"name": "reserve_inventory", "status": "completed", "completed_at": 1705312801},
{"name": "charge_payment", "status": "waiting", "waiting_since": 1705312802}
]
}
Admin API
GET /_admin/sagas?status=running # List sagas
GET /_admin/sagas/:id # Get saga details
POST /_admin/sagas/:id/retry # Retry failed saga
Filter by status (running, completed, failed, dead_lettered), environment, limit, offset.
Retry picks up where it left off, restarting from the failed step. The saga runner polls for work every 1000ms.
Dead letters
If compensation fails 3 times, the saga is dead-lettered. This requires manual intervention:
- Check the dashboard (dead letters appear prominently on the home page)
- Fix the underlying issue
- Retry via the admin API
Example: User onboarding
{
"modules": {
"sagas": {
"user_onboarding": {
"trigger": {
"aggregate_type": "user",
"event_type": "was_created"
},
"steps": [
{
"name": "send_welcome_email",
"emit": {
"aggregate_type": "email",
"id": "$.data.user_id",
"event_type": "was_queued",
"data": {"template": "welcome", "email": "$.data.email"}
}
},
{
"name": "provision_trial",
"emit": {
"aggregate_type": "subscription",
"id": "$.data.user_id",
"event_type": "trial_was_started",
"data": {"plan": "starter"}
},
"compensate": {
"aggregate_type": "subscription",
"id": "$.data.user_id",
"event_type": "trial_was_cancelled",
"data": {}
}
}
]
}
}
}
}
Compared to implications
| Implications | Sagas | |
|---|---|---|
| Trigger | Automatic on event | Trigger event in spec |
| Scope | Event chain (atomic) | Multi-step workflow |
| Compensation | Manual | Built-in |
| Human interaction | No | Yes |
| Failure handling | Retry | Compensation + retry |
| Use case | Simple reactions | Complex processes |
Use implications for if-this-then-that. Use sagas for processes requiring coordination and rollback.
Limitations
- Max 50 steps per saga
- Max 100 concurrent sagas per instance
- Max 3 compensation attempts before dead-lettering
- No nested sagas (start a new saga from a step instead)
Best practices
Keep steps small. Each step should do one thing. Easier to compensate, easier to debug.
Design compensations first. Before writing the action, know how to undo it.
Set realistic timeouts. Account for external API latency, human response time.
Monitor compensation rate. High compensation means the process design needs rethinking.
See also
- Implications guide - Simple reactive chains
- Scheduled events - Delayed execution