4 min readResonate HQJust published

Async HTTP API with durable workers in Rust on Resonate

How an axum gateway dispatches durable work to a worker pool and lets clients poll a non-blocking status endpoint.

Resonate brand card on a dark background with an ember spectrum wave at the bottom and the post headline in white Sansation.

An HTTP API that accepts long-running work needs to survive process restarts and worker crashes without losing the request. The shape of the Resonate solution is to split the system in two: a stateless HTTP gateway that dispatches a durable promise to a separate worker group and returns immediately, plus a polling endpoint that reads completion state from the Resonate Server rather than from gateway memory. The example-async-http-api-rs repo shows this with axum and the Rust SDK in roughly 200 lines across two binaries.

The shape of the solution

// ... inside async fn begin(...) ...
match state
    .resonate
    .rpc::<_, Value>(&id, "foo", data)
    .target("poll://any@worker")
    .spawn()
    .await
{
    Ok(handle) => (
        StatusCode::OK,
        Json(json!({
            "promise": handle.id,
            "status": "pending",
            "wait": format!("/wait?id={}", id),
        })),
    )
        .into_response(),
    // ...
}
// from example-async-http-api-rs/src/bin/gateway.rs:72

The gateway never awaits the workflow. .spawn() returns a handle as soon as the durable promise is created on the Resonate Server, and the HTTP response goes back to the client with a promise_id and a polling URL.

The worker side is a single registered function:

#[resonate::function]
async fn foo(_ctx: &Context, data: Value) -> Result<Value> {
    println!("processing on worker: {data}");
 
    let now_ms = std::time::SystemTime::now()
        .duration_since(std::time::UNIX_EPOCH)
        .map(|d| d.as_millis() as u64)
        .unwrap_or(0);
 
    Ok(json!({
        "result": format!("Processed: {data}"),
        "timestamp": now_ms,
    }))
}
 
// ... inside #[tokio::main] async fn main() ...
resonate.register(foo).unwrap();
// from example-async-http-api-rs/src/bin/worker.rs:11

The gateway runs in group "gateway" and the worker in group "worker"; the target("poll://any@worker") string in the RPC builder routes the work to any process in the worker group (src/bin/gateway.rs:33, src/bin/worker.rs:30, src/bin/gateway.rs:75).

The durable primitives in play

  • resonate.rpc(id, name, args).target(...).spawn() — creates a durable promise with the supplied id, enqueues the function-name dispatch for the target group, returns a handle without awaiting completion. Deduplication is keyed on id: a second call with the same id reattaches rather than starting new work. src/bin/gateway.rs:72-78.
  • resonate.get::<T>(&id) — re-attaches to an existing durable promise from a cold start. The /wait handler uses this so the gateway holds no in-memory state between requests. src/bin/gateway.rs:113.
  • handle.done() — non-blocking check on completion state; documented in the SDK as "Check if the promise is done (non-blocking)." src/bin/gateway.rs:124; SDK at resonate/src/handle.rs:62.
  • handle.result() — blocks until the promise completes, then decodes the resolved value or returns the rejection error. In the /wait handler this is called only after handle.done() returns true, so the block is degenerate. src/bin/gateway.rs:137; SDK at resonate/src/handle.rs:48-60.
  • #[resonate::function] + resonate.register(foo) — registers a function under a name the gateway can dispatch by string. src/bin/worker.rs:11, src/bin/worker.rs:34.

What the SDK handles vs. what you write

You write: the axum routes, the function body (foo), the choice of promise id for deduplication, and the group strings ("gateway" / "worker").

The SDK and Resonate Server handle: persisting the durable promise the moment .spawn() returns, routing the dispatch to a process in the worker group, redispatching to another worker if the first one disappears, surfacing completion state to any process that later calls resonate.get(id), and decoding resolved-vs-rejected results into typed Result<T>. The gateway process never tracks which worker took the job, and the worker process never tracks which gateway requested it — both sides exchange only (id, function_name, args) through the server.

Note that the worker function in this minimal example is a single straight-line block — it doesn't yet use ctx.run(...). The doc comment at src/bin/worker.rs:7-10 calls this out: real workloads would wrap each side-effecting step in ctx.run(step_fn, ...) so that a crash mid-function resumes from the last successful step rather than from the top. The example demonstrates the dispatch-and-poll shell; per-step checkpointing is the next layer.

Failure modes covered

  • Worker crashes mid-execution. The durable promise lives on the Resonate Server, not in worker memory. When the crashed worker stops heartbeating, the server re-dispatches the work to another process in the worker group. The dispatch target poll://any@worker (src/bin/gateway.rs:75) is what makes this load-balanced redelivery work.
  • Gateway crashes between /begin and the client's first /wait poll. The gateway holds no in-memory map of id → handle. On restart, /wait calls resonate.get(&id) (src/bin/gateway.rs:113) and reads the current state from the server.
  • Client retries /begin with the same id. Resonate deduplicates by id. The second .spawn() call returns a handle to the in-flight (or already-resolved) promise rather than starting duplicate work. The id query parameter on /begin (src/bin/gateway.rs:66) is what gives clients a stable idempotency key.
  • Client polls /wait against a workflow that hasn't started yet, or polls indefinitely. handle.done() returns false without blocking the gateway thread (src/bin/gateway.rs:124). The gateway responds with {"status":"pending"} and frees the connection.
  • Workflow rejects with an application error. handle.result() returns Err; the /wait handler maps that to {"status":"rejected", "error": ...} instead of crashing the gateway (src/bin/gateway.rs:147-156).

When to reach for this pattern

  • If you're exposing an HTTP endpoint that triggers work taking longer than a reasonable HTTP timeout (seconds to hours) and the client should not hold a connection open.
  • If the same logical request might be sent more than once (network retries, client-side retry loops) and you want the second request to attach to the first rather than duplicate the work.
  • If you want the HTTP frontend and the work-doing process to scale and crash independently — gateway pods can be replaced without dropping in-flight work, and worker pods can be replaced without dropping in-flight requests.
  • If a status/result endpoint must keep working across full restarts of every process in the system — i.e., recovery cannot depend on any specific process being up.
  • If you eventually want each step inside the worker function to be individually checkpointed (DB writes, external API calls, long computations), the same registration shape extends with ctx.run(...) calls inside foo.

Sources