Lifecycle and hooks

In plain words

Every time this program starts up, it always goes through the same four stages in the same order: starting, ready to work, finishing up leftover work, and fully stopped. Think of it like a shift at a shop: unlock the door and set up, open for customers, stop letting new customers in and finish serving the ones already inside, then lock up. Before it ever opens its doors to customers, it walks through a long checklist — is the configuration file readable, is the database reachable, are the different parts of the app talking about the same things correctly, are there any typos or missing pieces that would cause a crash later. If anything on that checklist is wrong, the shop simply never opens, and whoever runs it gets a full list of everything that’s broken — not just the first problem. That is the main promise here: a broken setup gets caught before real customers ever see it, instead of causing an incident later. When it’s time to shut down (say, the server needs to restart), it doesn’t slam the door — it stops accepting new customers, finishes serving everyone already being helped, cleans up, and only then turns off the lights, all within a time limit so it doesn’t hang forever. There are also two public signs on the door: one says “I’m alive” (used to decide whether to restart it if it crashed), and one says “I’m ready for business” (used to decide whether to send customers its way). The rest of this page explains exactly what happens in each stage and how a developer can hook their own logic into specific moments of that timeline.

starting → ready → draining → stopped

Why it exists

A Kumiko process moves through four lifecycle states in fixed order:

starting → ready → draining → stopped

Each transition is observable, the order never reverses, and the /health/ready endpoint reflects the current state. The framework runs this lifecycle for every process — API server, worker, outbox poller — and there is no per-feature variant.

This page is about the framework lifecycle, the boot-time validation it performs, and the hook points where feature code can attach. The detailed hook contract — phases, ordering, transactional behaviour — lives on Events and projections.

What runs at boot

Boot is sixteen phases in a fixed order. Each one waits for the previous, each one has a timeout, and each one decides what happens on failure:

1.  Load configuration            (kumiko.config.ts, env, defaults)
2.  Initialise observability      (tracing, metrics, structured logs)
3.  Initialise secrets provider   (env, vault, KMS)
4.  Runtime checks                (Bun version, polyfills)
5.  Register features             (every defineFeature() runs)
6.  Boot validation               (the feature graph is checked)
7.  Connect database              (with retry/backoff)
8.  Schema baseline check         (api-evolution, optional)
9.  Migration check               (apply / warn / exit on pending)
10. Connect Redis                 (with retry/backoff)
11. Initialise search adapter     (Meilisearch healthcheck)
12. Initialise file storage       (S3-compatible)
13. Start outbox poller
14. Start jobs worker             (if jobs feature is loaded)
15. Bind API listener
16. State → ready

The fail behaviour is per phase. Configuration errors are immediate exit; network connectivity gets retried with exponential backoff; observability falls back to a no-op provider rather than blocking boot. The point of the phase model is that the order of failure is deterministic — operators reading a boot log always see the same sequence, and a particular phase failing always means the same thing.

Two flags affect this in production. KUMIKO_STARTUP_TIMEOUT (default two minutes) caps the total boot time; a process that hasn’t reached ready by then exits, preventing zombie workers stuck on unreachable infrastructure. The migrations.mode setting controls phase 9: exit on pending, auto-apply, or warn — pick what fits your deploy strategy.

Boot validation: the feature graph check

Phase 6 — boot validation — is where Kumiko earns its “no runtime surprises” claim. Before any handler runs, the framework walks the entire feature graph and confirms it is internally consistent:

Check	What it catches
`r.requires("x")` resolves	Missing dependency that would crash on first call
Cross-feature handler reference exists	`ctx.write("orders:create")` when orders has no such handler
No circular dependencies	`A requires B requires A` would otherwise loop
Config keys read by features are declared	Typos turn into immediate boot errors, not runtime null
No entity name collisions	Two features each declaring `r.entity("user", …)`
`encrypted` and `searchable` are mutually exclusive	Encrypted columns cannot be indexed in plaintext
Registrar extension referenced without `r.requires`	`r.customFields(…)` without owning the dependency
`$user.*` ownership bindings exist	Typo in `$user.teamId` becomes a boot error

Anything that fails is reported as a list — not the first error, all of them. The exit code is non-zero, the log carries the full details, and the process never reaches ready. Production never sees any of these because production never starts with them.

This is the single most consequential property of the lifecycle model: configuration mistakes cannot ship. They become CI failures, not incident reports.

Where feature code can attach

The framework gives feature authors three families of attachment points across the lifecycle. Each one runs at a different phase and has a different transactional contract.

Boot-time hooks. A feature’s defineFeature body runs in phase 5. Anything you do there — registering entities, declaring events, attaching hooks, computing derived configuration — is a boot-time hook. By the time phase 6 runs, every feature has finished registering, and the registrar is frozen. There is no “register a handler at runtime” path.

Lifecycle hooks (per write). A r.hook("postSave", "incident", …) attaches to the lifecycle of an entity write, not the process lifecycle. It runs once per matching write, in inTransaction or afterCommit phase as you choose. The full contract is on Events and projections — for the purposes of this page, the relevant point is that they fire while the process is in the ready state, not at boot or during shutdown.

Background workers. Jobs and projections that run outside the request path are owned by the framework, not by feature code. A feature declares them — r.job("daily-report", { trigger: { cron: "0 9 * * *" } }, handler), r.multiStreamProjection({ … }) — and the framework starts them in phases 13-14, supervises them via heartbeat, and stops them during shutdown. Jobs accept three trigger shapes: { on: eventDef } (event-driven), { cron: "…" } (scheduled), or { manual: true } (queue-only).

Feature code never calls lifecycle.registerStartupPhase(…). The process-lifecycle API is internal, and the available registrar methods are the public surface.

Graceful shutdown

When the process receives SIGTERM (the orchestrator’s “stop, please”), the lifecycle moves to draining and the shutdown sequence runs:

draining state begins
  ↓
/health/ready returns 503         (load balancer drains traffic)
  ↓
Linger 3 seconds                  (give the LB time to react)
  ↓
Close the API listener            (no new connections)
  ↓
Wait for in-flight requests       (drain timeout, default 30s)
  ↓
Close SSE broker connections
  ↓
Stop the outbox poller            (finish current batch)
  ↓
Stop the jobs worker              (finish current jobs)
  ↓
Close Redis and database pools
  ↓
Flush observability               (send pending traces and metrics)
  ↓
state → stopped, exit(0)

The whole sequence has a hard timeout (KUMIKO_SHUTDOWN_TIMEOUT, default 40 seconds). After that, the process force-exits with a warning log. The hard cap exists because Kubernetes will send SIGKILL after its own grace period — better to exit cleanly with a logged warning than to be killed in the middle of a flush.

Background components register shutdown hooks during boot. They run in LIFO order: the last component to start is the first one to stop. Feature code does not register shutdown hooks; that surface is reserved for core features like core-jobs that own background work.

Health endpoints

Two endpoints expose the lifecycle to the outside world:

/health is liveness. It returns 200 as long as the process is alive, regardless of state. Orchestrators use this to decide whether to restart the container.
/health/ready is readiness. It returns 200 only in the ready state, with all dependencies healthy. Orchestrators use this to decide whether to send traffic.

Readiness includes per-component checks: database latency, Redis connectivity, search adapter, outbox poller heartbeat, jobs worker heartbeat, scheduler leader status. A 503 from /health/ready carries the failing checks in the body, so an operator looking at one HTTP response sees which subsystem is unhealthy.

Exclusive tasks: leader election

A Kumiko deployment with multiple processes needs to ensure that scheduled jobs run once. A daily report at 09:00 should fire on one worker, not on all of them. The framework runs a Redis-backed leader-election: every jobs worker tries to claim the leader lock at boot; the holder refreshes every five seconds; followers wait. Only the leader runs cron-scheduled jobs. Workers that pick up event-triggered or manually-triggered jobs do so via the queue, which fans out work correctly across all workers.

Feature code has no am-I-leader accessor. The election is a process concern, and the abstraction is “schedule this; the framework runs it once across the cluster”.

What this gives you

The fixed-order startup, boot-time validation, and graceful shutdown add up to two operational properties:

Misconfiguration cannot reach production. The feature graph, required handlers, registered config keys, and access rules are all checked before the API listener binds. CI exits 1; production never sees the broken state.
Deploys are uneventful. The shutdown sequence and health endpoints are written for rolling and blue-green deployments. New instance comes up, /health/ready flips to 200, traffic routes there; old instance receives SIGTERM, drains, exits clean.

Everything around the handler — process state, retries, leader election, hook ordering — is the framework’s job. The handler body is just the business decision.

Live example

Two validation hooks (one per write handler) plus a postSave entity hook that logs every save in the same transaction:

  // --- Validation hook on create: reject banned words + length ---
  r.hook("validation", articleCreate, (data) => {
    const title = data["title"] as string;
    if (title.toLowerCase().includes("spam")) {
      return [{ field: "title", error: "title_contains_banned_word" }];
    }
    if (title.length > 200) {
      return [{ field: "title", error: "title_too_long" }];
    }
    return null;
  });

  // --- Validation hook on update: length check on title changes ---
  r.hook("validation", articleUpdate, (data) => {
    const changes = data["changes"] as Record<string, unknown> | undefined;
    const title = changes?.["title"] as string | undefined;
    if (title && title.length > 200) {
      return [{ field: "title", error: "title_too_long" }];
    }
    return null;
  });

  // --- postSave entity hook: log all saves ---
  r.entityHook("postSave", article, async (result: SaveContext) => {
    hookLog.push({
      type: result.isNew ? "created" : "updated",
      data: { id: result.id, changes: result.changes },
    });
  });

Full source: samples/recipes/lifecycle-hooks — covers preDelete and postDelete too.