May 25, 20265 min readResonate HQ

Load balancing across a worker group in Python on Resonate

How tagging a Python process with `group="worker-group"` and addressing an RPC to `poll://any@worker-group` handles service discovery, load balancing, and crash recovery.

python load-balancing async-rpc for-agents

A single worker process eventually saturates, and running multiple instances forces the caller to solve service discovery, load balancing, and crash recovery for a pool that may grow, shrink, or have a member die mid-execution. Resonate's shape of the solution is to tag each worker process with a group name on construction and let the caller address an RPC to "any member" of that group through the durable promise store; the server then routes the invocation to a live worker and re-routes it if the claimant disappears before completing. The example runs N copies of worker.py in "worker-group", then fires repeated non-blocking begin_rpc calls from invoke.py against poll://any@worker-group.

The shape of the solution

from resonate import Resonate
from threading import Event
import time
 
# Initialize an instance of Resonate as a worker in "worker-group"
resonate = Resonate.remote(
    group="worker-group",
)
 
# Register the function with Resonate
@resonate.register
def compute_something(_, id, compute_cost):
    """A function that simulates a computation that takes some time."""
    print(f"starting computation {id}")
    # Using time.sleep(), instead of ctx.sleep(), blocks the thread, simulating a time-consuming task
    time.sleep(compute_cost)
    print(f"computed something that cost {compute_cost} seconds")
    return
 
 
def main():
    # Explicitly start Resonate instance threads
    resonate.start()
    print("worker running...")
    # Keep the main thread alive to allow async tasks to complete
    Event().wait()
# from example-load-balancing-py/worker.py:1-26

from resonate import Resonate
from uuid import uuid4
from random import randint
 
 
# Initialize an instance of Resonate as a client in "invoke-group"
resonate = Resonate.remote(
    group="invoke-group",
)
 
def main():
    # Generate a random compute cost between 1 and 10 seconds
    compute_cost = randint(1, 10)
    # Generate a random promise ID
    promise_id = str(uuid4())
    # Invoke compute_something() on the worker group
    _ = resonate.options(target="poll://any@worker-group").begin_rpc(promise_id, "compute_something",  promise_id, compute_cost)
# from example-load-balancing-py/invoke.py:1-17

compute_something is a regular function (not a generator) — the example does not use ctx.run, ctx.sleep, or any context primitive. The durability and routing work happens entirely at the boundaries: at registration on the worker, and at the RPC target on the caller.

The durable primitives in play

Resonate.remote(group="worker-group") — constructs a Resonate client that connects to a Resonate Server as both durable promise store and anycast message source, and joins the named group. worker.py:6-8. SDK constructor at v0.6.7: resonate-sdk-py/resonate/resonate.py:194-254.
@resonate.register — registers compute_something under its Python name and version 1 in the worker's local Registry, so the server can route an invocation keyed on the name string "compute_something" to this process. worker.py:11. SDK: resonate-sdk-py/resonate/resonate.py:316-399 (v0.6.7) — overload stubs at 316-331, implementation at 332-399.
resonate.start() — starts the worker's bridge, processor threads, and the Poller message source that long-polls the Resonate Server for tasks claimed against this group. worker.py:23. SDK: resonate-sdk-py/resonate/resonate.py:266-274 (v0.6.7).
resonate.options(target="poll://any@worker-group") — returns a per-call copy of the client with the RPC target overridden to anycast against the named group. invoke.py:17. The poll://any@<group> form is the SDK's anycast convention; see resonate-sdk-py/resonate/message_sources/poller.py:52-54 and resonate-sdk-py/resonate/conventions/remote.py:59-61.
resonate.begin_rpc(id, "compute_something", promise_id, compute_cost) — creates a durable promise keyed on id, dispatches the invocation to the targeted group, and returns a Handle[R] without blocking. The example discards the handle. invoke.py:17. SDK: resonate-sdk-py/resonate/resonate.py:633-713 (v0.6.7).
Worker ttl on claimed tasks — defaults to 10 seconds (ttl=10 in Resonate.remote's signature). When a worker claims a task it holds the claim for at most ttl seconds; if it dies without completing, the task becomes re-claimable by another member of the group. resonate-sdk-py/resonate/resonate.py:206 (v0.6.7).

What the SDK handles vs. what you write

SDK handles	You write
Discovering which worker instances are currently in `worker-group` and routing each RPC to one of them	`group="worker-group"` on the worker, `target="poll://any@worker-group"` on the caller
Persisting a durable promise per invocation keyed on the caller-supplied `promise_id`, so a duplicate `begin_rpc` with the same id resolves against the existing promise rather than starting a new execution	The `promise_id = str(uuid4())` (`invoke.py:15`) — or any deterministic id you want to use as an idempotency key
Holding the task claim for `ttl` seconds and re-routing it to another worker if the claimant dies	Nothing — `ttl=10` is the default
Long-polling the Resonate Server for tasks addressed to `worker-group` from each worker process	Calling `resonate.start()` once per worker (`worker.py:23`)
Resolving the function name string `"compute_something"` to the registered function in the receiving worker's registry	Registering the function once with `@resonate.register` (`worker.py:11`) and invoking by name string (`invoke.py:17`)

The caller does not know how many workers exist, where they run, or which one will pick up a given call. The worker code has no scheduling logic, no health endpoint, no service-discovery client. Both ends are written as if a single in-process function call were happening, and the durable promise plus the anycast target are doing the routing and recovery.

Failure modes covered

No worker is available when the invocation lands. The durable promise is created on the server when begin_rpc is called. If no worker in worker-group is currently claiming, the task waits in the server queue until one is, then gets routed. invoke.py:17 plus the server's anycast routing.
A worker crashes after claiming the task but before completing it. The claim expires after the worker's ttl (10 seconds by default; resonate-sdk-py/resonate/resonate.py:206), the task becomes re-claimable, and another worker in the group picks it up. The README confirms this explicitly: "If you kill one of the workers while it is in the middle of handling executions, you will see the executions recover on another worker" (README.md:77).
The caller fires the same promise_id twice. begin_rpc is keyed on id. A second call with the same id resolves against the existing durable promise rather than triggering a second execution. The example happens to use a fresh uuid4() per call (invoke.py:15), which means each invocation is intentionally a new execution; the dedupe guarantee is still there if a caller supplies a stable id.
The caller process exits immediately after dispatching. The invocation has already been persisted as a durable promise before begin_rpc returns the handle. Discarding the handle (_ = ...) is safe — the worker will run the function and resolve the promise on the server regardless of whether anyone is still subscribed. invoke.py:17.

What this example does not cover, by design:

No retry policy is configured on the call. If compute_something raises, the durable promise rejects; the caller never observes it because the handle is discarded.
No ctx.run checkpoint inside the function. The body is a single time.sleep plus two prints; a mid-function crash retries the whole function on recovery, not a sub-step.
No fan-out, no compensation, no scheduling.

When to reach for this pattern

If you have one stateless function that can run anywhere and you need to scale instances horizontally without writing service-discovery code.
If the caller wants fire-and-forget semantics for work that needs to survive worker churn — the durable promise outlives any individual worker.
If you need a worker pool where members can crash mid-execution and the in-flight work has to land on another member, without writing the recovery wiring yourself.
If you want to deploy a worker group as an independent process pool (one container image, N replicas, no leader, no peer discovery) and address it as a single logical endpoint from the caller side.
If you are migrating from an HTTP microservice with a load balancer + retry queue to a single-language workflow runtime and want the same shape with the queue, the routing, and the at-least-once delivery handled in one place.

Sources

Example repo: https://github.com/resonatehq-examples/example-load-balancing-py
Python SDK repo: https://github.com/resonatehq/resonate-sdk-py
Python SDK at the pinned version: https://github.com/resonatehq/resonate-sdk-py/tree/v0.6.7
Resonate documentation (worker groups and targets): https://docs.resonatehq.io/concepts/targets
Files cited in this post:
- worker.py:1-26 — the worker process: client construction, registration, run loop
- invoke.py:1-17 — the caller: client construction, RPC target, begin_rpc
- pyproject.toml:6-9 — Python and SDK pins
- .python-version:1 — Python pin
- SDK resonate/resonate.py:194-254 (v0.6.7) — Resonate.remote(...) classmethod
- SDK resonate/resonate.py:266-274 (v0.6.7) — Resonate.start()
- SDK resonate/resonate.py:281-314 (v0.6.7) — Resonate.options(...) and the target field
- SDK resonate/resonate.py:316-399 (v0.6.7) — @resonate.register (overload stubs 316-331, implementation 332-399)
- SDK resonate/resonate.py:633-713 (v0.6.7) — Resonate.begin_rpc(...)
- SDK resonate/message_sources/poller.py:49-54 — poll://uni@<group> (49-50) / poll://any@<group> (52-54) conventions
- SDK resonate/conventions/remote.py:59-61 — fallback target → poll://any@<target> mapping
Note on SDK version: pyproject.toml pins resonate-sdk>=0.6.7. The 0.6.x line exposes Resonate.remote(group=..., ...) and Resonate.local(...) as classmethod constructors. The current main branch of resonatehq/resonate-sdk-py has collapsed these into a single Resonate(url=..., group=...) constructor with auto-detection from RESONATE_URL. Against a post-0.6.7 release, verify the constructor signature and re-confirm the poll://any@<group> target string against the SDK you are pinning.