4 min readResonate HQJust published

Fan-out / fan-in across notification channels in Python on Resonate

How the fan-out / fan-in pattern collapses to four `yield ctx.rfi(...)` calls plus four `yield <handle>` fan-in points when every branch is a durable promise.

Resonate brand card on a dark background with a plum spectrum wave at the bottom and the post headline in white Sansation.

A single order.confirmed event has to notify the customer over four independent channels — email, SMS, Slack, push — and total time should be the slowest channel, not the sum, while a transient failure on any one channel must not cause the others to re-send. The Resonate shape of the solution is to start each channel as a Remote Function Invocation (ctx.rfi) keyed by a deterministic per-channel id, then yield each handle in turn to fan back in; every branch is a durable promise checkpointed independently. The repo pins resonate-sdk>=0.6.7 and runs against resonate serve from resonatehq/resonate-legacy-server (README:48-50), so the surface shown here is the pre-0.10 Python SDK API. The example demonstrates the workflow under the happy path and under a forced first-attempt failure on the push channel.

The shape of the solution

def notify_all(
    ctx: Context,
    event: dict[str, Any],
    simulate_crash: bool,
) -> Generator[Any, Any, dict[str, Any]]:
    start = time.time()
 
    # Fan-out: start all 4 channels simultaneously.
    # ctx.rfi(...) returns a handle immediately — no blocking.
    email_p = yield ctx.rfi(send_email, event).options(
        id=f"{ctx.id}.email",
    )
    sms_p = yield ctx.rfi(send_sms, event).options(
        id=f"{ctx.id}.sms",
    )
    slack_p = yield ctx.rfi(send_slack, event).options(
        id=f"{ctx.id}.slack",
    )
    push_p = yield ctx.rfi(send_push, event, simulate_crash).options(
        id=f"{ctx.id}.push",
    )
 
    # Fan-in: await each result.
    # If push fails and retries, the other channels are already checkpointed.
    email = yield email_p
    sms = yield sms_p
    slack = yield slack_p
    push = yield push_p
 
    results = [email, sms, slack, push]
 
    return {
        "order_id": event["order_id"],
        "channels_notified": sum(1 for r in results if r.get("success")),
        "total_ms": int((time.time() - start) * 1000),
        "results": results,
    }
# from example-fan-out-fan-in-py/workflow.py:29-65

The workflow is a generator function, not async def. yield ctx.rfi(...) returns a handle to the durable promise the SDK created for that branch; yield <handle> later in the function is where the workflow blocks on the result. The body has eight yield keywords total — four on the rfi-call lines, four on the fan-in lines.

The durable primitives in play

  • Resonate.remote() — constructs a client that talks to a Resonate server (the legacy server, per the SDK pin in the opener). main.py:25.
  • resonate.register(fn) — registers a top-level workflow or a child function so the worker can claim and execute it. The wrapper returned for the workflow exposes .run(id, *args). main.py:29-33.
  • workflow.run(id, *args) — starts the workflow under a caller-supplied promise id (f"notify/{event['order_id']}"), giving the run a stable handle for replay. main.py:58-62.
  • ctx.rfi(fn, *args) — Remote Function Invocation. Creates a durable promise for the child branch and returns a handle without blocking. workflow.py:38-49.
  • .options(id=...) — pins the child promise to a deterministic id (f"{ctx.id}.email", etc.), so on replay the SDK looks up the existing branch rather than starting a new one. workflow.py:38-49.
  • yield <handle> — the fan-in. The workflow suspends here until the branch's durable promise resolves; the SDK then feeds the resolved value back into the generator. workflow.py:53-56.

What the SDK handles vs. what you write

SDK handlesYou write
Creating one durable promise per ctx.rfi(...) and persisting it in the promise storeThe four ctx.rfi(...) calls (one per channel)
Suspending the workflow on yield <handle> and resuming when the promise resolvesThe yield <handle> fan-in points
Storing each branch's return value as soon as it resolves so that a retry does not re-execute itThe channel functions that produce that return value (send_email, send_sms, send_slack, send_push)
Retrying a registered function that raised, while leaving sibling branches untouchedThe actual failure mode in send_push (a RuntimeError on attempt 1 when simulate_crash is set)
Mapping the deterministic per-branch id to its stored result on replayThe id template f"{ctx.id}.<channel>" passed to .options(id=...)

The workflow body reads like straight-line Python. The branching, persistence, and per-step retry are not in the code you write — they are in the SDK + server.

Failure modes covered

  • The push channel raises on its first attempt. channels.py:131-134 throws RuntimeError("Push service unavailable — will retry") when simulate_crash is set and attempt == 1. Push has the shortest latency (time.sleep(0.12) at channels.py:129), so on the recorded crash run the push attempt fails before the other three channels finish: the README crash block (lines 110-121) shows push attempt 1 fails (line 115), then slack/sms/email complete on their own branches (the three completion lines at 116-118 — two Sent — msg_... lines plus one Posted — msg_slack_...), then push attempt 2 fires (lines 119-120). Email, SMS, and Slack are not re-invoked when push retries because each branch is keyed by its own deterministic id (f"{ctx.id}.<channel>", workflow.py:38-49) and lives in its own durable promise; the SDK retries only the failed send_push step.
  • The worker crashes mid-workflow. Because each branch promise carries a deterministic id (f"{ctx.id}.<channel>") and the workflow itself runs under a caller-supplied id (f"notify/{event['order_id']}", main.py:59), a replay of the workflow looks up the existing per-channel promises rather than issuing new sends. Branches that had already resolved feed their stored results back into the generator; branches still in flight resume on whichever worker claims them.
  • The workflow is invoked twice with the same order id. The outer promise id is keyed on order_id (main.py:59); the second call resolves against the existing promise rather than starting a parallel run.

The example does not implement provider-level idempotency on the channel side — that lives outside the workflow and is not claimed.

When to reach for this pattern

  • If a single trigger has to drive N independent side effects and total latency should track the slowest one, not the sum.
  • If partial failure must be tolerated — one branch failing and retrying without re-driving the successful branches.
  • If the branches are heterogeneous (different providers, different latencies, different retry profiles) but the orchestration is uniform.
  • If you want straight-line workflow code rather than a DAG declaration or a child-workflow framework.
  • If each branch's result needs to be durably captured so replays after a crash do not re-issue calls already acknowledged by the provider.

Sources