May 25, 20264 min readResonate HQ

Load balancing across worker instances in Rust on Resonate

How a worker group plus a `poll://any@workers` target collapses service discovery, load balancing, and crash recovery into config strings.

rust load-balancing for-agents

A single worker process saturates under load and disappears under crash; scaling out to N instances surfaces service discovery, balanced dispatch, and recovery as new problems the application code is forced to solve. Resonate moves all three concerns into the server: workers register into a named group, callers target the group rather than a specific process, and the server picks an available worker and reassigns the work if that worker dies. The example-load-balancing-rs repo demonstrates this with a worker binary that registers one function and a client binary that fires spawn() RPCs against poll://any@workers.

The shape of the solution

#[resonate::function]
async fn compute_something(ctx: &Context, id: String, compute_cost: u64) -> Result<()> {
    println!("{id} starting computation");
    // Durable sleep simulates a time-consuming task. Survives restarts —
    // if this worker crashes mid-sleep, another worker resumes it on the
    // remaining time.
    ctx.sleep(Duration::from_secs(compute_cost)).await?;
    println!("{id} computed something that cost {compute_cost} seconds");
    Ok(())
}
// from example-load-balancing-rs/src/bin/worker.rs:11

The function is plain async Rust. The group affiliation lives in the Resonate::new config, not in the function body:

let resonate = Resonate::new(ResonateConfig {
    url: Some("http://localhost:8001".into()),
    group: Some("workers".into()),
    ..Default::default()
});
 
resonate.register(compute_something).unwrap();
// from example-load-balancing-rs/src/bin/worker.rs:27

Every process started from this binary joins the workers pool. Run three, run thirty — the only thing each instance does on boot is poll the server for work targeted at its group.

The client is symmetrical: it joins a different group ("client") and fires the RPC with a target string that names the worker group, not a specific worker:

let _handle = resonate
    .rpc::<_, ()>(&id, "compute_something", (id.clone(), compute_cost))
    .target("poll://any@workers")
    .spawn()
    .await
    .expect("rpc spawn failed");
// from example-load-balancing-rs/src/bin/client.rs:23

spawn() returns a handle without awaiting the result, so the client process exits immediately after dispatching one invocation. The Resonate Server holds the work until a worker in the workers group claims it.

The durable primitives in play

Worker groups via ResonateConfig { group: ... } — every process that starts with group: Some("workers".into()) joins the same dispatch pool. src/bin/worker.rs:27-31 (worker joins workers), src/bin/client.rs:9-13 (client joins client); SDK at resonate-sdk-rs/resonate/src/resonate.rs:121 (group resolution) and :215 (group used as the default target resolver).
#[resonate::function] + resonate.register(compute_something) — registers compute_something under its name so any client can dispatch it by string. src/bin/worker.rs:11, src/bin/worker.rs:33.
resonate.rpc(id, name, args).target("poll://any@workers").spawn() — durable, fire-and-forget RPC: creates a root promise keyed on id, routes the dispatch to any worker in the named group, returns a handle without blocking on the result. src/bin/client.rs:23-28; SDK at resonate-sdk-rs/resonate/src/resonate.rs:361 (rpc), :940 (target on ResRpcTask), :948 (spawn).
ctx.sleep(Duration) — durable timer promise used here to simulate a multi-second compute. Survives worker restarts; on resumption another worker waits out only the remaining time. src/bin/worker.rs:17.
poll://any@workers target string — the routing primitive: scheme poll://, selector any, group workers. The server uses this to pick whichever process in workers claims the work first. src/bin/client.rs:25.

What the SDK handles vs. what you write

You write: one #[resonate::function], one Resonate::new with group: "workers", one resonate.register(...) line on the worker; and on the client one resonate.rpc(...).target("poll://any@workers").spawn() call. That is the entire surface.

The SDK and Resonate Server handle: subscribing each worker process to the workers group's poll queue, persisting every invocation as a durable promise keyed on the caller-supplied id, picking an available worker when a new invocation is dispatched, transferring ownership of an in-flight execution to another worker in the group when the original dies, replaying durable checkpoints (here, the ctx.sleep timer) on the recovering worker so it resumes from the remaining wait rather than from the top of the function, and deduplicating concurrent dispatches that share an id so a retried client doesn't double-run the work. Nothing in compute_something mentions service discovery, leader election, heartbeats, locks, or recovery, and nothing in main() does either.

Failure modes covered

One worker crashes mid-execution. Killing a worker while compute_something is in its ctx.sleep releases the in-flight durable promise back to the workers group's queue; another worker claims it and resumes from the remaining sleep duration, not from the top of the function. The README states this explicitly: "If you kill one of the workers while it is in the middle of handling executions, you will see the executions recover on another worker. The durable ctx.sleep survives the crash, so the recovered execution waits out only the remaining time." (README.md:76).
All workers down when the client dispatches. spawn() only creates the durable promise on the server; the dispatch enqueues against the workers group and waits for any worker to come online. Restarting workers is enough to drain the backlog. The client exits regardless (src/bin/client.rs:31: resonate.stop().await.ok();).
A single worker overloaded by burst traffic. target("poll://any@workers") routes each invocation to a worker that has capacity to claim it; running additional worker instances absorbs the burst without changing application code. The README invites the test directly: "As you invoke more and more executions, you will see them start to spread across the multiple worker instances." (README.md:74).
Same id dispatched twice. The RPC is keyed on the promise id (uuid::Uuid::new_v4().to_string() per dispatch in src/bin/client.rs:15); a second rpc with an already-PENDING id attaches to the existing execution rather than starting a parallel one. Here the client mints a fresh UUID per run, so retries from a higher layer that reuse an id will deduplicate.

When to reach for this pattern

If you need to scale a single function out to many workers without writing service-discovery, load-balancer, or registry code in the application.
If a worker that crashes mid-task must have its in-flight work recovered onto another worker without re-executing already-completed durable steps.
If you want clients (or other workflows) to address a pool by name (poll://any@workers) rather than holding references to specific processes or addresses.
If invocations are independent units of work (here, a single compute_something per RPC) and any worker in the pool is equally capable of running any one of them.
If you want a fire-and-forget dispatch pattern on the client (.spawn() returns a handle, does not block on the result) while still getting durable execution on the worker side.

Sources

Example repo: github.com/resonatehq-examples/example-load-balancing-rs
Rust SDK repo: github.com/resonatehq/resonate-sdk-rs
SDK primitives cited:
- resonate/src/resonate.rs — Resonate::new, ResonateConfig.group, Resonate::register, Resonate::rpc, ResRpcTask::target, ResRpcTask::spawn
- resonate/src/context.rs — Context::sleep
Docs:
- Load balancing example
- Rust SDK guide