A single worker process eventually saturates, and running multiple instances forces the caller to solve service discovery, load balancing, and crash recovery for a pool that may grow, shrink, or have a member die mid-execution. Resonate's shape of the solution is to tag each worker process with a group name on construction and let the caller address an RPC to "any member" of that group through the durable promise store; the server then routes the invocation to a live worker and re-routes it if the claimant disappears before completing. The example runs N copies of worker.py in "worker-group", then fires repeated non-blocking begin_rpc calls from invoke.py against poll://any@worker-group.
The shape of the solution
from resonate import Resonate
from threading import Event
import time
# Initialize an instance of Resonate as a worker in "worker-group"
resonate = Resonate.remote(
group="worker-group",
)
# Register the function with Resonate
@resonate.register
def compute_something(_, id, compute_cost):
"""A function that simulates a computation that takes some time."""
print(f"starting computation {id}")
# Using time.sleep(), instead of ctx.sleep(), blocks the thread, simulating a time-consuming task
time.sleep(compute_cost)
print(f"computed something that cost {compute_cost} seconds")
return
def main():
# Explicitly start Resonate instance threads
resonate.start()
print("worker running...")
# Keep the main thread alive to allow async tasks to complete
Event().wait()
# from example-load-balancing-py/worker.py:1-26from resonate import Resonate
from uuid import uuid4
from random import randint
# Initialize an instance of Resonate as a client in "invoke-group"
resonate = Resonate.remote(
group="invoke-group",
)
def main():
# Generate a random compute cost between 1 and 10 seconds
compute_cost = randint(1, 10)
# Generate a random promise ID
promise_id = str(uuid4())
# Invoke compute_something() on the worker group
_ = resonate.options(target="poll://any@worker-group").begin_rpc(promise_id, "compute_something", promise_id, compute_cost)
# from example-load-balancing-py/invoke.py:1-17compute_something is a regular function (not a generator) — the example does not use ctx.run, ctx.sleep, or any context primitive. The durability and routing work happens entirely at the boundaries: at registration on the worker, and at the RPC target on the caller.
The durable primitives in play
Resonate.remote(group="worker-group")— constructs a Resonate client that connects to a Resonate Server as both durable promise store and anycast message source, and joins the named group.worker.py:6-8. SDK constructor at v0.6.7:resonate-sdk-py/resonate/resonate.py:194-254.@resonate.register— registerscompute_somethingunder its Python name and version 1 in the worker's localRegistry, so the server can route an invocation keyed on the name string"compute_something"to this process.worker.py:11. SDK:resonate-sdk-py/resonate/resonate.py:316-399(v0.6.7) — overload stubs at 316-331, implementation at 332-399.resonate.start()— starts the worker's bridge, processor threads, and thePollermessage source that long-polls the Resonate Server for tasks claimed against this group.worker.py:23. SDK:resonate-sdk-py/resonate/resonate.py:266-274(v0.6.7).resonate.options(target="poll://any@worker-group")— returns a per-call copy of the client with the RPC target overridden to anycast against the named group.invoke.py:17. Thepoll://any@<group>form is the SDK's anycast convention; seeresonate-sdk-py/resonate/message_sources/poller.py:52-54andresonate-sdk-py/resonate/conventions/remote.py:59-61.resonate.begin_rpc(id, "compute_something", promise_id, compute_cost)— creates a durable promise keyed onid, dispatches the invocation to the targeted group, and returns aHandle[R]without blocking. The example discards the handle.invoke.py:17. SDK:resonate-sdk-py/resonate/resonate.py:633-713(v0.6.7).- Worker
ttlon claimed tasks — defaults to10seconds (ttl=10inResonate.remote's signature). When a worker claims a task it holds the claim for at mostttlseconds; if it dies without completing, the task becomes re-claimable by another member of the group.resonate-sdk-py/resonate/resonate.py:206(v0.6.7).
What the SDK handles vs. what you write
| SDK handles | You write |
|---|---|
Discovering which worker instances are currently in worker-group and routing each RPC to one of them | group="worker-group" on the worker, target="poll://any@worker-group" on the caller |
Persisting a durable promise per invocation keyed on the caller-supplied promise_id, so a duplicate begin_rpc with the same id resolves against the existing promise rather than starting a new execution | The promise_id = str(uuid4()) (invoke.py:15) — or any deterministic id you want to use as an idempotency key |
Holding the task claim for ttl seconds and re-routing it to another worker if the claimant dies | Nothing — ttl=10 is the default |
Long-polling the Resonate Server for tasks addressed to worker-group from each worker process | Calling resonate.start() once per worker (worker.py:23) |
Resolving the function name string "compute_something" to the registered function in the receiving worker's registry | Registering the function once with @resonate.register (worker.py:11) and invoking by name string (invoke.py:17) |
The caller does not know how many workers exist, where they run, or which one will pick up a given call. The worker code has no scheduling logic, no health endpoint, no service-discovery client. Both ends are written as if a single in-process function call were happening, and the durable promise plus the anycast target are doing the routing and recovery.
Failure modes covered
- No worker is available when the invocation lands. The durable promise is created on the server when
begin_rpcis called. If no worker inworker-groupis currently claiming, the task waits in the server queue until one is, then gets routed.invoke.py:17plus the server's anycast routing. - A worker crashes after claiming the task but before completing it. The claim expires after the worker's
ttl(10seconds by default;resonate-sdk-py/resonate/resonate.py:206), the task becomes re-claimable, and another worker in the group picks it up. The README confirms this explicitly: "If you kill one of the workers while it is in the middle of handling executions, you will see the executions recover on another worker" (README.md:77). - The caller fires the same
promise_idtwice.begin_rpcis keyed onid. A second call with the sameidresolves against the existing durable promise rather than triggering a second execution. The example happens to use a freshuuid4()per call (invoke.py:15), which means each invocation is intentionally a new execution; the dedupe guarantee is still there if a caller supplies a stable id. - The caller process exits immediately after dispatching. The invocation has already been persisted as a durable promise before
begin_rpcreturns the handle. Discarding the handle (_ = ...) is safe — the worker will run the function and resolve the promise on the server regardless of whether anyone is still subscribed.invoke.py:17.
What this example does not cover, by design:
- No retry policy is configured on the call. If
compute_somethingraises, the durable promise rejects; the caller never observes it because the handle is discarded. - No
ctx.runcheckpoint inside the function. The body is a singletime.sleepplus two prints; a mid-function crash retries the whole function on recovery, not a sub-step. - No fan-out, no compensation, no scheduling.
When to reach for this pattern
- If you have one stateless function that can run anywhere and you need to scale instances horizontally without writing service-discovery code.
- If the caller wants fire-and-forget semantics for work that needs to survive worker churn — the durable promise outlives any individual worker.
- If you need a worker pool where members can crash mid-execution and the in-flight work has to land on another member, without writing the recovery wiring yourself.
- If you want to deploy a worker group as an independent process pool (one container image, N replicas, no leader, no peer discovery) and address it as a single logical endpoint from the caller side.
- If you are migrating from an HTTP microservice with a load balancer + retry queue to a single-language workflow runtime and want the same shape with the queue, the routing, and the at-least-once delivery handled in one place.
Sources
- Example repo: https://github.com/resonatehq-examples/example-load-balancing-py
- Python SDK repo: https://github.com/resonatehq/resonate-sdk-py
- Python SDK at the pinned version: https://github.com/resonatehq/resonate-sdk-py/tree/v0.6.7
- Resonate documentation (worker groups and targets): https://docs.resonatehq.io/concepts/targets
- Files cited in this post:
worker.py:1-26— the worker process: client construction, registration, run loopinvoke.py:1-17— the caller: client construction, RPC target,begin_rpcpyproject.toml:6-9— Python and SDK pins.python-version:1— Python pin- SDK
resonate/resonate.py:194-254(v0.6.7) —Resonate.remote(...)classmethod - SDK
resonate/resonate.py:266-274(v0.6.7) —Resonate.start() - SDK
resonate/resonate.py:281-314(v0.6.7) —Resonate.options(...)and thetargetfield - SDK
resonate/resonate.py:316-399(v0.6.7) —@resonate.register(overload stubs 316-331, implementation 332-399) - SDK
resonate/resonate.py:633-713(v0.6.7) —Resonate.begin_rpc(...) - SDK
resonate/message_sources/poller.py:49-54—poll://uni@<group>(49-50) /poll://any@<group>(52-54) conventions - SDK
resonate/conventions/remote.py:59-61— fallbacktarget→poll://any@<target>mapping
- Note on SDK version:
pyproject.tomlpinsresonate-sdk>=0.6.7. The 0.6.x line exposesResonate.remote(group=..., ...)andResonate.local(...)as classmethod constructors. The currentmainbranch ofresonatehq/resonate-sdk-pyhas collapsed these into a singleResonate(url=..., group=...)constructor with auto-detection fromRESONATE_URL. Against a post-0.6.7release, verify the constructor signature and re-confirm thepoll://any@<group>target string against the SDK you are pinning.
