5 min readResonate HQJust published

Async RPC across services in Python on Resonate

Three remote-invocation shapes — rfc, detached, rfi — across a Flask gateway and nine Python service groups, with crash recovery.

Resonate brand card on a dark background with a plum spectrum wave at the bottom and the post headline in white Sansation.

A multi-service request flow that crosses process boundaries needs to survive any node crashing mid-flight without re-doing completed work. Resonate models the entire cross-service call graph as a durable promise rooted at the gateway, so each remote invocation is a checkpoint and any service in the same group can resume an in-flight call. This example exposes three HTTP routes on a Flask gateway and routes each to one of three durable remote-invocation shapes — ctx.rfc (await chain), ctx.detached (fire-and-forget chain), and ctx.rfi (fan-out with promises) — backed by nine Python services.

The shape of the solution

The gateway is the ephemeral-to-durable boundary. Every Flask route uses resonate.options(target=...).rpc(promise_id, func, *args) with a hard-coded promise_id and a target of the form poll://<group>:

@app.route("/await-chain", methods=["POST"])
def await_chain_route_handler():
    try:
        print("running await_chain_route_handler")
        promise_id = "await-chain"
        handle = resonate.options(target="poll://service-a").rpc(promise_id, "foo")
        print("waiting on result")
        message = handle.result()
        return jsonify({"message": message}), 200
    except Exception as e:
        print(e)
        return jsonify({"error": str(e)}), 500
# from example-async-rpc-py/src/gateway.py:20

Inside the durable call graph, the three flows differ only in which Context method they call. The fan-out workflow uses two rfi calls back-to-back to overlap the remote invocations of rax and dop:

@resonate.register
def zim(ctx, arg):
    print("running function zim")
    promise_bar = yield ctx.rfi("rax").options(target="poll://service-h")
    promise_baz = yield ctx.rfi("dop").options(target="poll://service-i")
    result_bar = yield promise_bar
    result_baz = yield promise_baz
    return result_bar + result_baz + arg
# from example-async-rpc-py/src/service_g.py:22

The durable primitives in play

  • resonate.options(target=...).rpc(promise_id, func, *args) — the ephemeral-to-durable entry point. Returns a handle whose .result() blocks on the entire durable call graph. The static promise_id ("await-chain", "detached-chain", "fan-out-workflow") means a re-sent request reconnects to the same in-flight invocation. gateway.py:25, gateway.py:39, gateway.py:52.
  • ctx.rfc(func) — Remote Function Call. The generator yields and is paused until the remote function returns, then receives the result inline. Used to chain foo → bar → baz. service_a.py:22, service_b.py:22.
  • ctx.detached(func, arg) — invokes a remote function in a new Call Graph detached from the caller's. Returns an RFI (mode="detached") whose yield resolves to a durable promise; the caller can either yield on that promise to retrieve the callee's return value (the same shape as ctx.rfi) or discard it for fire-and-forget. Used to chain qux → quz → cog, fire-and-forget style: service_d.py:21 discards the promise (unassigned yield); service_e.py:22 captures it as result but never yields on it (so result is a Promise handle, and return result + 1 would only matter if anyone awaited quz's return — nothing does). The final chain value is therefore printed by cog itself on service-f (service_f.py:22) rather than returned up the chain.
  • ctx.rfi(func) — Remote Function Invocation. Returns a durable promise that can be yielded on at any later point in the generator. zim issues two rfi calls back-to-back and then yields on both promises, producing in-parallel execution of rax and dop. service_g.py:25, service_g.py:26.
  • Application Node identity (group + id) — each service constructs Resonate with a hard-coded app_node_group (e.g., "service-a") and a fresh uuid.uuid4() app_node_id so multiple instances can share a group for anycast routing via target="poll://<group>". service_a.py:8–16 (pattern repeated across service_b..service_i). The gateway is the exception: gateway.py:7–8 hard-codes both app_node_id = "gateway" and app_node_group = "gateway" because it is a single unicast node that initiates calls but does not receive them.

What the SDK handles vs. what you write

You write three things: the Flask route handlers that call resonate.options(target=...).rpc(...), the generator functions registered with @resonate.register, and the per-node identity (app_node_group, app_node_id). Each durable function is a plain Python generator that yields on a Context method to invoke remote work — there is no transport code, no message broker setup, no per-call retry wrapper, no shared correlation IDs to thread through requests, and no try/except inside durable functions (README.md:71).

The SDK handles the rest: durably persisting the call graph and per-invocation arguments to the Resonate Server, routing each call to a node in the target group via the poll://<group> address, awaiting and resolving cross-process promises, replaying the generator from the last checkpoint after a crash, automatically retrying functions that raise, and reconnecting a re-sent top-level invocation (same promise_id) to the existing in-flight durable promise.

Failure modes covered

  • A service node crashes mid-call. Each ctx.rfc / ctx.rfi / ctx.detached call is a durable checkpoint. When a new node joins the same group, it claims the durable promise and resumes from the last completed step. Documented in README.md:82–99; demonstrable by injecting yield ctx.sleep(10) into any function (README.md:91–96) and killing the process during the sleep.
  • The gateway crashes after handing off to rpc(...). The durable call graph continues to make progress in the service groups. Because the route handler uses a static promise_id (gateway.py:24, :38, :51), a re-sent cURL request invokes rpc with the same id and the SDK returns a handle attached to the existing in-flight promise rather than starting a new one (README.md:98–99).
  • A durable function raises. Inside the durable call graph the SDK catches the error and retries the function automatically — no try/except is needed in foo, bar, baz, qux, quz, cog, zim, rax, or dop (README.md:71). The only try/except blocks in the codebase are the three Flask route handlers (gateway.py:22, :36, :49), where the ephemeral HTTP request can't be resumed.
  • The Flask process crashes before the detached chain completes. The detached chain does not depend on the gateway for completion — qux detaches quz, quz detaches cog, and cog prints the value on service-f (service_f.py:22). The /detached-chain handler discards the rpc handle without calling .result() (gateway.py:39–40) and returns "detached-chain started" immediately, so the gateway is not in the result path.
  • Two service-a nodes are running at the same time. The poll://service-a target is anycast — only one node in the group claims a given invocation (README.md:80). Adding nodes to a group is the horizontal-scaling and high-availability story; killing nodes is the recovery story.

When to reach for this pattern

  • If you have an HTTP gateway that fans work out to multiple downstream services and you want crashes anywhere in the chain to be recoverable, not lost.
  • If you need a request flow that can be resumed by a re-sent client request (same promise_id in, same durable promise out).
  • If the work fans out in parallel across services and the caller needs to combine results, use ctx.rfi to grab promises up front and yield each promise when you need its value.
  • If the work is a chain where each step hands off to the next without anyone waiting on the tail, use ctx.detached and let the final node print, persist, or notify.
  • If the work is a synchronous service-to-service chain where each step needs the next step's return value, use ctx.rfc.
  • If you are running multiple instances of the same service for HA and want anycast routing without standing up a load balancer, set group on each node and target poll://<group>.

Sources