5 min readResonate HQJust published

Multi-agent pipeline with durable handoffs in TypeScript on Resonate

How three sequential LLM calls collapse to three lines of generator code when every yield is a Resonate checkpoint.

Resonate brand card on a dark background with a teal spectrum wave at the bottom and the post headline in white Sansation.

A pipeline of three LLM agents — researcher, writer, reviewer — must survive any single agent failure without re-running the earlier agents, because each call is slow, costly, and non-deterministic. The Resonate shape of the solution is to register each agent as a plain async function and orchestrate them from a generator workflow where every yield* ctx.run(...) is a durable checkpoint; an agent that throws is retried in place while siblings stay cached in the promise store. The example shows the pipeline under the happy path and under a forced first-attempt failure on the writer, plus a commented-out swap to a real human-in-the-loop step via ctx.promise.

The shape of the solution

// OrchestrationResult defined at src/workflow.ts:21-27
export function* orchestrate(
  ctx: Context,
  topic: string,
  crashOnWriter: boolean,
): Generator<any, OrchestrationResult, any> {
  // Step 1: Research — gather findings
  const findings = yield* ctx.run(researcher, topic);
 
  // Step 2: Write — produce a draft from findings
  // If crashOnWriter=true, the writer fails on first attempt and retries.
  // The researcher does NOT re-run on retry — its result is cached.
  const draft = yield* ctx.run(writer, topic, findings, crashOnWriter);
 
  // Step 3: Review — check the draft quality
  const review = yield* ctx.run(reviewer, draft);
 
  // Step 4: Human approval (simulated in demo)
  // In production:
  //   const approval = yield* ctx.promise({});
  //   // surface approval.id externally (email, dashboard) so a human can resolve it
  //   const decision = yield* approval;
  // This blocks until an external system resolves the promise.
  const approved = review.toUpperCase().includes("APPROVED");
 
  return {
    status: approved ? "published" : "rejected",
    topic,
    findings,
    draft,
    review,
  };
}
// from example-multi-agent-orchestration-ts/src/workflow.ts:29-60

The orchestrator is a generator function (function*), not async. Each yield* ctx.run(agent, ...) invokes the child agent under a durable promise and suspends the orchestrator until the result is checkpointed in the promise store.

The durable primitives in play

  • new Resonate() — constructs the Resonate client embedded in the worker process; no external server required for the default run. src/index.ts:8.
  • resonate.register(orchestrate) — registers the top-level workflow so the worker can claim and execute it under a caller-supplied id. src/index.ts:9.
  • resonate.run(runId, orchestrate, topic, crashMode) — invokes the registered workflow with a caller-supplied id; the runId is defined at src/index.ts:27 (orchestration-${Date.now()}) and the call itself is at src/index.ts:29. Resolves to the workflow's return value.
  • yield* ctx.run(fn, ...args) — runs an async function as a durable child step. The return value is persisted at the call site; on replay the SDK returns the cached value rather than re-invoking fn. Used for all three agent calls. src/workflow.ts:35, :40, :43.
  • yield* ctx.promise({}) — referenced in the commented production block (src/workflow.ts:47-49, README:107-140) as the human-in-the-loop primitive. Returns a durable promise with an auto-generated id; yielding the promise blocks the workflow until something external resolves it.
  • resonate.stop() — shuts down the underlying network transport and message source after the run completes; the client should not be used for further operations. src/index.ts:43; SDK doc at resonate.d.ts:154-156.

ctx.sleep, ctx.detached, ctx.beginRun, and resonate.schedule are not used in this example.

What the SDK handles vs. what you write

SDK handlesYou write
Checkpointing the return value of each yield* ctx.run(agent, ...) in the durable promise storeThe three yield* ctx.run(...) lines and the three agent functions (researcher, writer, reviewer)
Suspending the generator after each yield* and resuming when the child promise resolvesThe straight-line orchestrator body (findings, draft, review, approved)
Replaying the orchestrator after a crash using cached step values rather than re-running completed agentsNo replay code — the orchestrator is written as if it never crashes
Retrying a child function passed to ctx.run that throws, and emitting Runtime. Function 'writer' failed with ... (retrying in 2 secs) — only orchestrate is registered (src/index.ts:9); the agents are imported in src/workflow.ts:2 and passed by reference to ctx.run(...), and the SDK applies its default Exponential retry policy to themThe actual failure (throw new Error("Writer agent connection reset (simulated)") at src/agents.ts:58) — no retry decorator, no try/catch
Holding the worker on a durable promise via ctx.promise({}) until it is resolved externallyThe promise reference (const approval = yield* ctx.promise({})) and the resolver call (POST /promises/<id>/resolve from outside)

The orchestrator body is three assignments, one boolean, and a return. The retry behaviour, the cached intermediate results, the resume-after-crash semantics, and the blocking on external approval all live in the SDK and (for ctx.promise resolution) the Resonate server, not in the code the author writes. README:103 frames it as "The entire orchestrator is 15 lines."

Failure modes covered

  • Writer throws on its first attempt. src/agents.ts:56-59 throws Error("Writer agent connection reset (simulated)") when crashOnFirst is set and attempt === 1. The SDK retries the writer function that was passed by reference to ctx.run (only orchestrate is registered at src/index.ts:9; writer is imported in src/workflow.ts:2). The README crash-mode transcript (README:70-85) shows [researcher] Complete (312 chars) printed once, then [writer] Writing article (attempt 1), then the Resonate retry log line Runtime. Function 'writer' failed with 'Error: Writer agent connection reset (simulated)' (retrying in 2 secs), then [writer] Writing article (attempt 2) — the researcher does not re-run because its return value is already checkpointed at src/workflow.ts:35.
  • Worker crashes between agents. The orchestrator is invoked with a caller-supplied id (src/index.ts:27-29). On restart, the SDK replays the orchestrator under that id and finds the prior ctx.run(...) results in the promise store; only the unfinished step re-enters the agent function. README:98 states this explicitly: "add process.exit(1) after any agent call and restart — resumes from there".
  • The orchestration is invoked twice with the same id. This is a property of the SDK, not one this example exercises: when resonate.run(runId, ...) is called with a previously-used runId, it resolves against the existing durable promise rather than starting a parallel pipeline. The example regenerates the id per process with orchestration-${Date.now()} at src/index.ts:27, so the deduplication is not actually demonstrated here — a stable caller-side id would make it observable.
  • Human approval never arrives (production extension). Swapping in yield* ctx.promise({}) per src/workflow.ts:47-49 and README:125-130 makes the workflow block indefinitely on the durable promise. The worker can restart, the server can restart — the promise stays in the store until something external POSTs to http://localhost:8001/promises/<approvalPromise.id>/resolve (README:135-138). Port 8001 is the Resonate server's default HTTP port; the URL only works once a Resonate server is running, which the example does not start. The default bun start does not enable this path; it is documented as the swap-in for the server-backed deployment.

The example does not pass idempotency keys to the Anthropic API. A retry of ctx.run(writer, ...) will call the Anthropic SDK again — retries are at the agent level, not the LLM-call level.

When to reach for this pattern

  • If you are chaining multiple LLM agents in sequence and re-running an earlier agent on a later-step failure is unacceptable (cost, latency, non-determinism).
  • If you want straight-line orchestration code for a sequential agent pipeline instead of a DAG framework or a hosted multi-agent runtime.
  • If the pipeline needs to survive worker restarts mid-run and resume at the failed step.
  • If a downstream step (review, approval, publish) needs to block on a human decision that may take minutes, hours, or days — ctx.promise({}) makes the wait durable across restarts and across services.
  • If you need per-agent retry without writing per-agent retry decorators or try/catch scaffolding around each call.
  • If the agents themselves are plain async functions that happen to call an LLM, and you want the orchestration concern lifted out of them entirely.

Sources