4 min readResonate HQJust published

Long-lived human-in-the-loop step in Rust with an HTTP gateway

How a workflow suspends on a latent durable promise and resumes when an HTTP handler resolves it by ID.

Resonate brand card on a dark background with an ember spectrum wave at the bottom and the post headline in white Sansation.

A workflow that needs a human decision (approval, sign-off, click-a-link) has to pause for an arbitrary amount of time — seconds, hours, days — without losing its place if the process running it dies. With Resonate, that pause is an await on a latent durable promise: a promise that no function backs, that lives on the Resonate server, and that anything reachable on the network can resolve by ID. The example-human-in-the-loop-rs repo demonstrates this with two binaries — a worker that runs the workflow and an axum HTTP gateway that both starts workflows and resolves their blocking promises.

The shape of the solution

#[resonate::function]
async fn foo(ctx: &Context, workflow_id: String) -> Result<String> {
    // Latent durable promise — no function backing it. Resolved externally.
    let blocking_promise = ctx.promise::<bool>();
    let promise_id = blocking_promise.id().await?;
 
    // Make the promise ID reachable from outside (email, webhook, log, ...).
    ctx.run(send_email, promise_id.clone()).await?;
    println!("blocked, waiting on human interaction (workflow {workflow_id})");
 
    // Suspend until the promise resolves. Survives crashes.
    let _approved = blocking_promise.await?;
    println!("unblocked, promise resolved");
 
    Ok(format!("workflow {workflow_id} completed"))
}
// from example-human-in-the-loop-rs/src/bin/worker.rs:11

The gateway side is two HTTP handlers — one starts the workflow by RPC, one resolves the latent promise:

async fn resolve_promise(
    State(state): State<AppState>,
    Path(promise_id): Path<String>,
) -> impl IntoResponse {
    state
        .resonate
        .promises
        .resolve(&promise_id, json!(true))
        .await
        .expect("failed to resolve promise");
 
    Json(json!({ "message": "promise resolved" }))
}
// from example-human-in-the-loop-rs/src/bin/gateway.rs:72

The durable primitives in play

  • ctx.promise::<bool>() — creates a latent durable promise on the Resonate server with no function backing it; the workflow's continuation is gated on its resolution. worker.rs:14.
  • blocking_promise.id().await? — surfaces the server-assigned promise ID so external systems can target it. worker.rs:15.
  • ctx.run(send_email, ...) — durable local invocation of a leaf function; checkpointed so the notification side-effect is not re-issued on replay. worker.rs:18.
  • blocking_promise.await? — suspends the workflow until the durable promise settles. If the worker process dies while suspended, the promise stays PENDING on the server; another worker recovers the workflow and waits on the same promise. worker.rs:22.
  • resonate.rpc(workflow_id, "foo", ...).target("poll://any@workers") — RPC into the workers group, keyed by workflow_id. Resonate deduplicates by ID: a second call with the same workflow_id reconnects to the PENDING execution instead of starting a new one. gateway.rs:59-60.
  • resonate.promises.resolve(&promise_id, json!(true)) — settles the latent promise from outside any workflow context; this is what unblocks the workflow. gateway.rs:79.

What the SDK handles vs. what you write

You write: the foo workflow function (one ctx.promise(), one ctx.run, one await), a leaf send_email function that prints the callback URL, and two axum handlers (start_workflow, resolve_promise). About 140 lines across both binaries (53 in worker.rs, 85 in gateway.rs).

The SDK and the Resonate server handle: creating the latent promise record, persisting the promise ID, suspending the worker's execution while keeping the promise PENDING on the server, recovering the workflow onto another worker if the suspended one dies, deduplicating RPC by promise ID so re-invoking with the same workflow_id reconnects rather than re-runs, persisting the ctx.run checkpoint so send_email is not repeated on replay, and unblocking the suspended await when promises.resolve is called. None of that bookkeeping appears in user code.

Failure modes covered

  • Worker crashes while suspended on human input. The latent promise lives on the Resonate server (worker.rs:14), not in the worker's memory. After the worker dies, the promise is still PENDING; the server reassigns the workflow to another worker in the workers group, which resumes at blocking_promise.await? on the same promise ID (worker.rs:22).
  • The HTTP gateway receives the same start_workflow request twice. The RPC is keyed on req.workflow_id (gateway.rs:59). The second call reconnects to the PENDING execution started by the first call and awaits the same result rather than starting a parallel workflow. If the first call has already RESOLVED, the second call returns the cached result.
  • The human clicks the callback URL after the workflow has already resolved. The workflow is already past blocking_promise.await? and completed, so its result is unaffected by a second promises.resolve call. The gateway's behavior on a re-settle depends on server semantics and is not handled defensively in this example — the handler calls .expect("failed to resolve promise") (gateway.rs:81), so if the server returns an error for re-settling a RESOLVED promise the request handler panics rather than returning the "promise resolved" JSON at gateway.rs:83.
  • The send_email step runs and then the worker crashes before suspending. ctx.run(send_email, ...) is a durable checkpoint (worker.rs:18); on recovery the SDK replays from the checkpoint and does not call send_email again, so the user does not get two emails for one workflow.
  • Indefinite wait. There is no timeout in the example; the workflow can sit on blocking_promise.await? for as long as the promise remains PENDING. The SDK exposes .timeout(...) on ctx.promise() if a bounded wait is wanted.

When to reach for this pattern

  • If a workflow needs to block on a human action (approval, signature, click-to-confirm) whose response time is bounded only by the human, not by a process lifetime.
  • If you have a long-running approval flow currently implemented as a state machine in a database, polled by a cron job — and you want it to read as straight-line async code instead.
  • If a webhook from a third party (DocuSign, Stripe Connect, identity verification) is the resume signal, and you need the workflow to remain reconnectable across deploys and restarts while it waits.
  • If multiple workers should be able to recover any suspended workflow, not just the one that started it.
  • If you want callers to be able to safely retry start_workflow requests without spawning duplicate executions.

Sources