A FaaS-style platform needs to accept a function submission, route it to a worker that can run it (a GPU node, a CPU node, a node with a specific accelerator), and return a result whose delivery survives worker crashes. Resonate models this as a durable orchestration on a router process that issues remote calls to a named worker group; the durable promise lives on the Resonate server, so a crashed worker is replaced by another worker in the same group without losing the submission. The example-function-as-a-service-py repo ("Modulate") demonstrates this with one router (modulate.py), one worker (worker.py) hard-coded to the gpu group, and three invocation modes — RPC, RFI, and ctx.detached — selected per submission via the router's wait flag. The repo accepts a --machine CLI flag but does not yet wire it into the poll(...) target; both branches dispatch to the gpu group. Multi-group routing is described in the README's architecture diagram but is not implemented in the code at HEAD; wiring machine_type into the target is left as an extension.
SDK-version note. pyproject.toml:7 pins resonate-sdk>=0.6.7, but the committed source still uses pre-v0.6.7 import paths and kwargs: resonate.targets (removed), resonate.task_sources.poller (renamed to resonate.message_sources.poller), resonate.retry_policy.never (now resonate.retry_policies.Never), the task_source= kwarg (now message_source=), and .options(send_to=poll("gpu")) (now .options(target="poll://any@gpu/<worker-id>"), produced by Poller(...).anycast). The code blocks below are lifted verbatim from the repo for traceability and will not import as-is against the pinned SDK; the orchestration shape they demonstrate is the same on both surfaces.
The shape of the solution
The router's orchestrating function reads the script, then branches on whether the caller wants to block for a result (RPC) or fire-and-forget via a detached child (RFI):
@resonate.register(retry_policy=never())
def prep_execute(ctx: Context, id, script, machine_type, wait):
try:
content = ""
with open(script, "r") as file:
content = file.read()
print("Sending script to be executed in gpu")
if wait:
result = yield ctx.rfc(execute, content, id).options(
id=id, send_to=poll("gpu")
)
return result
else:
detached_id = f"detached_{id}"
yield ctx.detached(detached_id, detached_rfi, content, id, "gpu")
return None
except FileNotFoundError as ef:
print("Error: File not found.")
raise ef
except Exception as e:
print(f"An error occurred: {e}")
raise e
# from example-function-as-a-service-py/modulate.py:30-53execute is registered on the router as a stub whose body is ..., so its name is in the local registry but the real body lives on the worker:
@resonate.register()
def execute(ctx: Context, script_content, id): ...
# from example-function-as-a-service-py/modulate.py:20-21The real body is the worker-side execute:
@resonate.register()
def execute(ctx: Context, script_content, script_id):
# ...
temp_dir = tempfile.mkdtemp()
# ...
result = subprocess.run(
["python", "-I", "-S", script_path],
cwd=temp_dir,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True,
timeout=120,
check=True,
)
# ...
return output_filename
# from example-function-as-a-service-py/worker.py:15-76Router and worker are separate processes pinned to different polling groups. The router is in the entry group; the worker is hard-coded to gpu:
# router process
resonate = Resonate(store=RemoteStore(), task_source=Poller(group="entry"))
# from example-function-as-a-service-py/modulate.py:10
# worker process
resonate = Resonate(store=RemoteStore(), task_source=Poller(group="gpu"))
# from example-function-as-a-service-py/worker.py:12The README documents uv run python worker.py --group gpu (README.md:113), but worker.py has no argparse and no CLI flag — group selection requires editing worker.py:12. Treat the README flag as aspirational.
The durable primitives in play
@resonate.register(retry_policy=never())— registersprep_executeas a durable function with retries disabled, so aFileNotFoundErrorfrom the script-load step is fatal rather than replayed against the bad path.modulate.py:30-31.ctx.rfc(execute, content, id).options(id=id, send_to=poll("gpu"))— Remote Function Call: yields the coroutine, blocksprep_executeuntil the promise identified byidresolves on the Resonate server, and returns the worker's return value. Thesend_to=poll("gpu")directs the invocation to thegpuworker group's poller.modulate.py:38-42.ctx.detached(detached_id, detached_rfi, content, id, "gpu")— starts a new top-level call graph not tied to the caller's lifetime. The router can returnNoneimmediately and the detached child handles the actual remote invocation.modulate.py:43-46.ctx.rfi(execute, content, id).options(id=id, send_to=poll("gpu"))— Remote Function Invocation inside the detached child: fire-and-forget, noyield-blocking on the result.modulate.py:24-27.resonate.promises.get(id=id)— non-durable read of the promise store from outside any orchestration; used by the--getCLI flag to poll for completion.modulate.py:13-17.Poller(group=<name>)task source — each process polls Resonate for tasks addressed to its group. The router pollsentry; the worker pollsgpu.modulate.py:10,worker.py:12. Group is set in source, not at CLI time.prep_execute.run(execution_id, id, script, machine_type, wait)thenhandle.result()— the CLI entrypoint starts the durable orchestration and blocks for its return value.modulate.py:101-105.
What the SDK handles vs. what you write
You write the orchestration as straight-line Python with yield for the three remote primitives (rfc, rfi, detached). You write the per-group worker registration and the actual execution body. You write the CLI surface that turns --script foo.py --machine gpu into a prep_execute.run(...) call.
The SDK does the rest. The Resonate server holds the durable promise keyed by the submission id. The Poller running inside each process subscribes to its group's queue and pulls work. When prep_execute yields an rfc, the SDK serialises the call, writes it to the server as a durable promise targeting poll("gpu"), and pauses the coroutine. A gpu worker's poller picks up the task, runs execute, and resolves the promise with the return value (or rejects it with an exception). The SDK resumes prep_execute with the value, on whatever router replica is alive at that moment — not necessarily the one that originally yielded. None of the routing, queueing, promise persistence, or replay is in the application code.
The split also explains the execute = ... stub at modulate.py:20-21: the router never executes execute locally, but its function name must be in the local registry so ctx.rfc(execute, ...) can resolve the registered name to send to the wire.
Failure modes covered
- Worker crashes mid-execute. The durable promise lives on the Resonate server, not on the worker. When the worker dies, the lease on the task expires and another worker in the same group picks it up on its next poll. The router's
yield ctx.rfc(...)is unaffected — it was already suspended waiting for the promise. The promise resolves once for the submission as a whole, but the worker-sideexecutebody can run more than once if a crash happens after side effects but before the promise is resolved; the worker's only externally-visible side effects are the per-script_id.sout/.eoutfiles (worker.py:26-27, 66-72) and the sandboxed subprocess, both of which are safe to re-run on a clean temp directory. The README phrases this as "exactly-once execution semantics" (README.md:240-245), which is accurate for the promise but not for the body. This is a property of the durable-promise + group-polling design, not of code paths in this repo. - Router process restarts mid-orchestration.
prep_executeis registered durable. When the router restarts, the SDK replays the coroutine from the last yield checkpoint; the openrfcis still on the server as an unresolved promise, so the replay re-attaches to the existing promise rather than re-submitting work.modulate.py:30, 38-42. - Caller wants fire-and-forget with no live coupling.
wait=Falsetakes thectx.detachedbranch (modulate.py:43-46), which spawns a new top-level call graph; the router'sprep_executereturnsNoneand exits, but the detacheddetached_rficontinues to drive the actualrfiagainst thegpugroup. The submission can be queried later viaresonate.promises.get(id=id)(modulate.py:13-17). - Bad script path.
prep_executeis registered withretry_policy=never()(modulate.py:30). Theopen(script, "r")atmodulate.py:34-35raisesFileNotFoundError(modulate.py:48-50); the orchestration fails immediately instead of replaying against a path that will keep failing. - User script misbehaves on the worker. The worker sandboxes the script in a
tempfile.mkdtemp()directory (worker.py:25) and runs it withsubprocess.run(["python", "-I", "-S", ...], timeout=120, check=True)(worker.py:39-47).subprocess.TimeoutExpiredandCalledProcessErrorare caught and reported (worker.py:52-55); the temp directory is always cleaned up infinally(worker.py:58-60). Note thatworker.py:53reports "Error: Execution timed out after 5 seconds" while the actualtimeout=argument is 120 seconds — the error string lags the timeout value and should be read as "timed out".
When to reach for this pattern
- If you need to route function executions to a worker group and the routing decision is made at submission time. Wiring multiple groups (e.g. selecting between
gpuandcpufrommachine_type) requires threading the argument into thetargetkwarg; this example fixes the target togpuand would need that extension. - If a submission must outlive the router process — the caller wants to disconnect and come back later for the result via an ID.
- If you want one orchestration entrypoint to support both blocking (RPC) and fire-and-forget (RFI / detached) submission modes, selectable per call.
- If workers in a group must be interchangeable and crashes during execution must transparently re-dispatch to a peer.
- If you need both a durable submission protocol and an out-of-band polling API for status, served from the same promise store.
Sources
- Example repo:
https://github.com/resonatehq-examples/example-function-as-a-service-py - Resonate Python SDK:
https://github.com/resonatehq/resonate-sdk-py(tagv0.6.7matchespyproject.toml:7) - SDK primitives referenced (v0.6.7):
Context.rfi—resonate-sdk-py/resonate/resonate.py:1037-1057Context.rfc—resonate-sdk-py/resonate/resonate.py:1060-1080Context.detached—resonate-sdk-py/resonate/resonate.py:1083-1127Function.options(target=...)—resonate-sdk-py/resonate/resonate.py:1314-1366Poller.anycast(v0.6.7 replacement fortargets.poll(...)) —resonate-sdk-py/resonate/message_sources/poller.py:52-54Neverretry policy —resonate-sdk-py/resonate/retry_policies/__init__.py:6
- Resonate docs:
https://docs.resonatehq.io/sdk/python(Python SDK reference),https://docs.resonatehq.io/concepts/rpc-rfi(RPC vs RFI background, linked from the example README).
