A business process that needs to pause for hours, days, or longer cannot rely on holding a single Python process open — the longer a process lives, the more likely it crashes. Resonate replaces the in-process wait with a server-backed timer promise: the workflow suspends, the worker becomes free, and any worker in the group resumes the workflow when the timer fires. The example-durable-sleep-py repo shows this in roughly 25 lines split between a worker script and a client script.
The shape of the solution
from resonate import Resonate, Context
from threading import Event
resonate = Resonate.remote(group="worker")
@resonate.register
def sleeping_workflow(ctx: Context, wf_id: str, secs: float):
print(f"Workflow {wf_id} starting, will sleep for {secs} seconds.")
yield ctx.sleep(secs)
return f"Workflow {wf_id} completed after sleeping for {secs} seconds."
resonate.start()
print("worker is running...")
Event().wait()
# from example-durable-sleep-py/worker.py:1-16The whole pattern is one line: yield ctx.sleep(secs). The workflow is a generator function — yielding the ctx.sleep command hands control back to the SDK, which records the durable timer promise on the Resonate Server and releases this worker thread. The function only resumes (from the same yield point) once the server-side timer has fired.
The client side dispatches the workflow by id, by registered function name, to the worker group:
from resonate import Resonate
resonate = Resonate.remote(group="client")
def main():
try:
id = "sleep-workflow-1"
func = "sleeping_workflow"
secs = 5.0
handle = resonate.options(target="poll://any@worker").begin_rpc(id, func=func, wf_id=id, secs=secs)
result = handle.result()
print(result)
except Exception as e:
print(e)
main()
# from example-durable-sleep-py/client.py:1-16The worker process is registered into the worker group (worker.py:4), the client into the client group (client.py:3), and poll://any@worker is the target string that routes the RPC to any process polling the worker group.
The durable primitives in play
ctx.sleep(secs: float)— creates a durable timer promise on the Resonate Server (Sleepconvention) and yields back to the SDK so the workflow suspends until the timer resolves.worker.py:9; SDK atresonate/resonate.py:1137(Context.sleep).@resonate.register— registerssleeping_workflowunder its own name in the function registry so the client can dispatch it by string.worker.py:6; SDK atresonate/resonate.py:317(Resonate.register, first overload).Resonate.remote(group=...)— classmethod that builds aResonateclient wired to aRemoteStoreand aPollermessage source bound to the given group.worker.py:4,client.py:3; SDK atresonate/resonate.py:195(in v0.6.7).resonate.options(target=...).begin_rpc(id, func=..., **kwargs)— non-blocking durable RPC from outside a workflow: creates a root durable promise keyed onid, routes the dispatch to the named target, and returns aHandle.client.py:10; SDK atresonate/resonate.py:618(Resonate.begin_rpc).handle.result()— blocks the calling thread until the durable promise resolves and returns the workflow's return value.client.py:11.
What the SDK handles vs. what you write
You write: the generator workflow body, the duration argument, the promise id on the client side, and the group strings ("worker" / "client"). That is the entire surface.
The SDK and the Resonate Server handle: creating the timer promise from ctx.sleep(secs) with a timeout_at derived from the duration, suspending the workflow generator the moment it yields the sleep command (the SDK records the yield and releases the worker thread), persisting the timer across worker restarts, firing the timer at the requested wall-clock time, dispatching the resumed workflow to any process polling the worker group via the poll://any@worker target, replaying the generator up to the same yield ctx.sleep(...) point and feeding the resolved value back in, and decoding the eventual return value back into a Python string on the client side of handle.result(). The worker process holds no in-memory wait state — between yield ctx.sleep(secs) releasing control and the workflow resuming after the timer fires, no Python coroutine or thread is kept alive on the worker for this workflow.
Failure modes covered
- Worker crashes mid-sleep. The timer promise lives on the Resonate Server, not in the worker process. When the timer fires, the server dispatches the resumption to any process polling the
workergroup; this is thepoll://any@workertarget on the client's RPC (client.py:10) plus the worker's group registration (worker.py:4). Restart the worker after killing it mid-sleep and the workflow recovers from the server-side timer and finishes. - Sleep duration exceeds any single process lifetime. Because the wait is a server-side timer promise, not a
time.sleep(...)call inside Python, durations of days or years do not depend on a single process surviving that long. The SDK'sContext.sleepdocstring states this directly: "There is no maximum sleep duration. Fractional (floating-point) values are supported for sub-second precision." (resonate/resonate.py:1144). - Client retries with the same id. The RPC is keyed on the promise id (
"sleep-workflow-1"inclient.py:7); a secondbegin_rpcwith the same id reconnects to the existing pending execution rather than starting a second sleep. The SDK'sbegin_rpcdocstring states: "If a durable promise with the sameidalready exists, Resonate will reuse the existing execution state or subscribe to its result instead of starting a new one." (resonate/resonate.py:644). - No worker available when the timer fires. The timer still fires on the server; the resumed workflow waits in the queue for the
workergroup until a process polls for work. Restarting the worker is enough to drain it.
When to reach for this pattern
- If a workflow must pause for longer than is comfortable to keep a Python process running (anything from minutes upward, with no upper bound).
- If you would otherwise reach for a cron job, scheduled task, or external timer service purely to re-awaken an idle process — and the only reason for the second system is the duration of the wait.
- If you want the wait to be a single yield in the workflow body rather than splitting the workflow into "before the wait" and "after the wait" callables connected by external state.
- If the resumed work must run on whichever worker is healthy at the moment the timer fires, not necessarily the one that started the wait.
- If you need the wait to be idempotent under retries — a second invocation with the same id during the sleep window must attach to the in-flight timer, not start a second one.
Sources
- Example repo: github.com/resonatehq-examples/example-durable-sleep-py
- Python SDK repo: github.com/resonatehq/resonate-sdk-py
- SDK primitives cited (pinned to
resonate-sdk>=0.6.7):resonate/resonate.py—Resonate.remote,Resonate.register,Resonate.begin_rpc,Context.sleep
- Docs:
