Sagas

Sagas orchestrate long-running workflows across multiple aggregates. Unlike implications (which react automatically), sagas are explicit: a trigger event starts them, they execute steps in order, and they handle failure with compensation.

When to use sagas

Good for: - Multi-step business processes (checkout, onboarding, provisioning) - Operations spanning external services (payment APIs, email, webhooks) - Processes needing human approval - Anything where partial failure requires rollback

Not for: - Simple single-aggregate updates (use regular events) - Automatic reactions to events (use implications) - Single delayed actions (use scheduled events)

You might not need a saga

If it's a single delayed action -- send a reminder in 24 hours, expire a token next week -- use a scheduled event. Sagas are for multi-step workflows where steps depend on each other and failures need to unwind previous work.

Defining sagas

Sagas are defined in modules.sagas in your spec:

{
  "modules": {
    "sagas": {
      "checkout": {
        "trigger": {
          "aggregate_type": "cart",
          "event_type": "checkout_started"
        },
        "steps": [
          {
            "name": "reserve_inventory",
            "emit": {
              "aggregate_type": "inventory",
              "id": "$.data.product_id",
              "event_type": "reservation_requested",
              "data": {
                "quantity": "$.data.quantity"
              }
            },
            "await": {
              "aggregate_type": "inventory",
              "event_types": ["was_reserved", "reservation_failed"]
            },
            "compensate": {
              "aggregate_type": "inventory",
              "id": "$.data.product_id",
              "event_type": "reservation_released",
              "data": {}
            },
            "timeout_ms": 30000
          },
          {
            "name": "charge_payment",
            "condition": {"equals": ["$prev.type", "was_reserved"]},
            "emit": {
              "aggregate_type": "payment",
              "id": "$.data.payment_id",
              "event_type": "charge_requested",
              "data": {"amount": "$.data.total"}
            },
            "await": {
              "aggregate_type": "payment",
              "event_types": ["was_charged", "charge_failed"]
            },
            "compensate": {
              "aggregate_type": "payment",
              "id": "$.data.payment_id",
              "event_type": "refund_requested",
              "data": {}
            }
          },
          {
            "name": "notify",
            "condition": {"equals": ["$prev.type", "was_charged"]},
            "emit": {
              "aggregate_type": "notification",
              "id": "$.data.customer_id",
              "event_type": "order_confirmation_sent",
              "data": {
                "transaction_id": "$prev.data.transaction_id"
              }
            }
          }
        ],
        "on_complete": {
          "aggregate_type": "order",
          "id": "$.data.order_id",
          "event_type": "checkout_completed",
          "data": {}
        },
        "on_failed": {
          "aggregate_type": "order",
          "id": "$.data.order_id",
          "event_type": "checkout_failed",
          "data": {"error": "$error.message"}
        }
      }
    }
  }
}

Trigger

"trigger": {
  "aggregate_type": "cart",
  "event_type": "checkout_started"
}

When checkout_started is written to any cart, this saga starts.

Steps

Each step has a name, an event to emit, and optionally: - await - event types to wait for (step completes when one arrives) - compensate - event to emit if a later step fails - timeout_ms - how long to wait (required for steps with await) - condition - only run if true

Referencing data

Templates pull data from the trigger event and previous steps:

Template Description
$.key Trigger aggregate key (cart:abc123)
$.id Trigger aggregate ID (abc123)
$.type Trigger event type
$.data.* Trigger event data
$prev.type Previous step's response type
$prev.data.* Previous step's response data
$context.<step>.* Earlier step's result by name
$error.message Error message (in on_failed)
$error.step Failed step name (in on_failed)

The condition on charge_payment checks that inventory was actually reserved before charging. If the previous step returned reservation_failed, this step is skipped.

How sagas execute

  1. Trigger event is written, saga starts
  2. Execute step 1: Emit inventory.reservation_requested
  3. Wait: For a matching await event or timeout
  4. On success: Proceed to step 2
  5. On failure: Run compensations in reverse order, mark saga failed

Each step emits its event, then waits for a response matching one of the await.event_types. If no response arrives within timeout_ms, the step fails.

Steps without await

Steps without await succeed immediately after emitting. Use this for fire-and-forget actions like logging:

{
  "name": "audit_log",
  "emit": {
    "aggregate_type": "audit",
    "id": "global",
    "event_type": "checkout_attempted",
    "data": {"cart_id": "$.key"}
  }
}

Compensation

When a step fails, the saga walks backward through completed steps and emits their compensate events:

Step 3 fails
-> Compensate step 2
-> Compensate step 1
-> Saga marked failed

Steps without compensate are skipped during rollback (like notify -- nothing to undo).

If compensation itself fails after 3 attempts (max_compensation_attempts), the saga is dead-lettered for manual review.

Lifecycle events

on_complete fires when all steps succeed. on_failed fires after compensation completes. Both are optional.

Saga state

Query saga status via the admin API:

GET /_admin/sagas/:id
Authorization: Bearer <operator-jwt>

Response:

{
  "saga_id": "saga_abc123",
  "type": "checkout",
  "status": "running",
  "current_step": "charge_payment",
  "started_at": 1705312800,
  "steps": [
    {"name": "reserve_inventory", "status": "completed", "completed_at": 1705312801},
    {"name": "charge_payment", "status": "waiting", "waiting_since": 1705312802}
  ]
}

Admin API

GET  /_admin/sagas?status=running     # List sagas
GET  /_admin/sagas/:id                # Get saga details
POST /_admin/sagas/:id/retry          # Retry failed saga

Filter by status (running, completed, failed, dead_lettered), environment, limit, offset.

Retry picks up where it left off, restarting from the failed step. The saga runner polls for work every 1000ms.

Dead letters

If compensation fails 3 times, the saga is dead-lettered. This requires manual intervention:

  1. Check the dashboard (dead letters appear prominently on the home page)
  2. Fix the underlying issue
  3. Retry via the admin API

Example: User onboarding

{
  "modules": {
    "sagas": {
      "user_onboarding": {
        "trigger": {
          "aggregate_type": "user",
          "event_type": "was_created"
        },
        "steps": [
          {
            "name": "send_welcome_email",
            "emit": {
              "aggregate_type": "email",
              "id": "$.data.user_id",
              "event_type": "was_queued",
              "data": {"template": "welcome", "email": "$.data.email"}
            }
          },
          {
            "name": "provision_trial",
            "emit": {
              "aggregate_type": "subscription",
              "id": "$.data.user_id",
              "event_type": "trial_was_started",
              "data": {"plan": "starter"}
            },
            "compensate": {
              "aggregate_type": "subscription",
              "id": "$.data.user_id",
              "event_type": "trial_was_cancelled",
              "data": {}
            }
          }
        ]
      }
    }
  }
}

Compared to implications

Implications Sagas
Trigger Automatic on event Trigger event in spec
Scope Event chain (atomic) Multi-step workflow
Compensation Manual Built-in
Human interaction No Yes
Failure handling Retry Compensation + retry
Use case Simple reactions Complex processes

Use implications for if-this-then-that. Use sagas for processes requiring coordination and rollback.

Limitations

  • Max 50 steps per saga
  • Max 100 concurrent sagas per instance
  • Max 3 compensation attempts before dead-lettering
  • No nested sagas (start a new saga from a step instead)

Best practices

Keep steps small. Each step should do one thing. Easier to compensate, easier to debug.

Design compensations first. Before writing the action, know how to undo it.

Set realistic timeouts. Account for external API latency, human response time.

Monitor compensation rate. High compensation means the process design needs rethinking.

See also

Can't find what you need? support@j17.app