A multi-turn LLM agent that interleaves model calls, third-party API calls (Serper, Browserless), and blocking user input must survive a worker restart mid-conversation without re-paying for completed LLM calls or losing prior turns. The Resonate shape of the solution is to register the agent loop as a generator workflow and run every external interaction — LLM completion, web search, page scrape, user input — as a durable child step via ctx.lfc; each completed step's result is checkpointed in the promise store and replayed from cache after a crash. The example wires an OpenAI chat-completions tool-calling loop with two tools (internet_search, scrape_website) and a synchronous input() prompt for the human, with all four interactions running through the same durable primitive.
The shape of the solution
@resonate.register
def travel_assistent(ctx):
messages = [
{"role": "system", "content": "..."}, # system prompt elided
{"role": "user", "content": "Plan a trip for me."} # whitespace trimmed
]
while True:
message = yield ctx.lfc(interact_with_llm, messages)
# Always add the assistant response
assistant_message = {"role": "assistant", "content": message["content"]}
if message.get("tool_calls"):
assistant_message["tool_calls"] = [
# ... tool_calls dict construction elided
]
messages.append(assistant_message)
content = message.get("content")
if content and "TRIP PLANNING COMPLETE" in content:
break
elif message["tool_calls"]:
for tool_call in message["tool_calls"]:
tool_name = tool_call["name"]
args = tool_call["args"]
if tool_name == "internet_search":
result = yield ctx.lfc(search_internet, args["search_query"], args.get("num_results", 5))
elif tool_name == "scrape_website":
result = yield ctx.lfc(scrape_website, args["url"])
else:
result = "Unknown tool call"
messages.append({"role": "tool", "tool_call_id": tool_call["id"], "content": result})
elif content:
input_message = yield ctx.lfc(chat_with_user, content)
messages.append({"role": "user", "content": input_message})
return message["content"]
# from example-ai-travel-assistant-py/src/agent.py:26-90The orchestrator is a generator function, not async def. Every yield ctx.lfc(fn, ...) runs fn as a durable child step in the same worker process; once the child completes, its return value is persisted at the call site and replayed from the promise store on retry, so a question that the LLM has already answered is not re-asked across a worker crash.
The durable primitives in play
Resonate.remote()— constructs the Resonate client wired to a remote Resonate server (the worker polls the server for tasks).src/agent.py:20.resonate.set_dependency(name, obj)— registers worker-process-scoped objects (the OpenAI client, the Serper API key, the Browserless API key) so child functions can retrieve them viactx.get_dependency(...)instead of reading globals.src/agent.py:21,:22,:23.@resonate.register— registers the top-level workflow under a name so the worker can claim and execute it under a caller-supplied promise id.src/agent.py:26.yield ctx.lfc(fn, *args)— Local Function Call. Runsfnas a durable child step inside the same worker; its return value is checkpointed in the promise store and replayed from cache on retry. Used for the LLM call (src/agent.py:56), the Serper search (:80), the Browserless scrape (:82), and the blocking userinput()(:87).ctx.get_dependency(name)— retrieves a worker-scoped dependency from inside a child function.src/llm.py:6,src/tools.py:49,src/tools.py:75.Function.run(id)— top-level entry point. The caller supplies the trip id (trip_id) which becomes the workflow's promise id; a second invocation with the same id returns a handle to the same promise rather than starting a parallel run.src/agent.py:98.handle.result()— blocks until the durable promise resolves and returns the workflow's return value.src/agent.py:99.
What the SDK handles vs. what you write
| SDK handles | You write |
|---|---|
Checkpointing each ctx.lfc(...) return value in the durable promise store | The yield ctx.lfc(...) calls and the child functions (interact_with_llm, search_internet, scrape_website, chat_with_user) |
| Suspending the generator after each yield and resuming when the child promise resolves | The straight-line while True: loop body — append message, branch on tool_calls vs content, break on the completion sentinel |
| Replaying the generator after a worker crash using cached step values rather than re-issuing the LLM, Serper, and Browserless calls | No replay code — the loop is written as if it never crashes |
| Identifying the workflow run by the caller-supplied trip id and deduplicating duplicate concurrent runs against the same id (returns a handle to the existing promise rather than starting a second run) | The trip_id = input(...) and travel_assistent.run(trip_id) call (src/agent.py:97-98) |
| Routing the OpenAI client + API keys into each child function via the dependency registry | The three resonate.set_dependency(...) lines in module init (src/agent.py:21-23) |
Retrying failed child functions under the same step id using the default Exponential() retry policy for non-generator functions (resonate-sdk-py/resonate/options.py:23) | Error-return branches inside the tools (src/tools.py:59-60, :87-88, :89-90, :92-95) for cases the LLM should see and recover from rather than retry |
The orchestrator body is one while True: with three branches (tool call, finished, ask user). The retry semantics, the cached intermediate messages state, the resume-after-crash behaviour, and the identity-based deduplication of concurrent runs all sit in the SDK + server, not in the code the author wrote.
Failure modes covered
- Worker crashes between LLM turns. The orchestrator is registered under the caller-supplied
trip_id(src/agent.py:97-98). On worker restart the workflow is replayed; everyctx.lfccall that completed and checkpointed before the crash returns its cached value from the promise store, so the OpenAI call from a prior turn is not re-issued, the Serper search is not re-charged, and the Browserless scrape is not re-fetched. Only the call that was in-flight at crash time (or had not yet checkpointed) actually re-executes. - A tool call (Serper or Browserless) fails transiently.
search_internetreturns a plain string on a missingorganickey (src/tools.py:59-60);scrape_websitereturns"error scraping website content: ..."on non-200 (src/tools.py:87-88) and on an empty body (:89-90); the HTML parser is wrapped intry/exceptand returns"error parsing HTML: ..."(src/tools.py:92-95). The LLM receives the error string as the tool result and can decide to retry the call, scrape a different URL, or ask the user. The agent does not crash on a single bad tool result. - A tool call raises an unhandled exception. Because the tool is invoked as
ctx.lfc(fn, ...), an exception is caught by the SDK rather than the workflow body. The default options applyExponential()to non-generator functions andNever()to generator functions (resonate-sdk-py/resonate/options.py:23); sincesearch_internet,scrape_website, andchat_with_userare plain functions, they retry underExponential()under the same step id. After retries exhaust, the exception propagates back into the generator at theyield ctx.lfc(...)site, where the workflow body can catch it. The earlier LLM and tool results stay cached at their prior checkpoints — only the failing step retries. - Duplicate workflow invocation with the same
trip_id. The Resonate server keys the workflow by its promise id; a secondtravel_assistent.run(trip_id)against the same id resolves against the existing promise rather than starting a second concurrent planning session. - Long human idle time.
chat_with_user(src/agent.py:93-94) calls Python's blockinginput(). Because it runs insidectx.lfc, the call is a single durable step: the worker is occupied while waiting for input, but if the worker is killed and restarted, prior LLM and tool results are not lost — only this step is re-entered. (For a production human-in-the-loop swap-out the example would use actx.promise(...)resolved by an external HTTP gateway; the current example uses the blockinginput()path.)
The example does not implement provider-side idempotency on the OpenAI, Serper, or Browserless calls — that is outside the workflow's scope and is not claimed.
When to reach for this pattern
- If you are running a multi-turn LLM tool-calling loop where the conversation has accumulated state (paid LLM tokens, scraped pages, gathered facts) you cannot afford to redo on a crash.
- If you want straight-line agent code — a
while True:overinteract_with_llmand tool dispatch — instead of writing a custom checkpoint-on-disk layer per turn. - If the agent interacts with third-party APIs you pay per call (LLM providers, search APIs, scraping APIs) and re-running them on retry would be wasteful or rate-limit-burning.
- If the agent blocks on human input in any form and you want that wait to be a durable step rather than process-bound in-memory state.
- If you want per-step retry of any external call (LLM, search, scrape, user input) without writing per-call retry decorators or try/except scaffolding.
Sources
- Example repo: https://github.com/resonatehq-examples/example-ai-travel-assistant-py
- Python SDK repo: https://github.com/resonatehq/resonate-sdk-py
- Resonate documentation: https://docs.resonatehq.io
- Files cited in this post:
src/agent.py:20-23— Resonate client construction and dependency wiringsrc/agent.py:26-90— thetravel_assistentgenerator workflowsrc/agent.py:93-94—chat_with_user(blockinginput()step)src/agent.py:96-101—main()entry point (travel_assistent.run(trip_id),handle.result())src/llm.py:4-32—interact_with_llmOpenAI chat-completions callsrc/tools.py:6-45— OpenAI tools schemasrc/tools.py:48-70—search_internet(Serper)src/tools.py:73-100—scrape_website(Browserless + unstructured HTML partition)resonate-sdk-py/resonate/options.py:23— default retry policy (Exponential()for non-generator,Never()for generator functions)pyproject.toml:10— declaresresonate-sdk>=0.6.7uv.lock— records its ownspecifier = ">=0.5.3"forresonate-sdkand pinsversion = "0.5.3"; the committed lockfile was generated against a looser constraint than the currentpyproject.toml. Both0.5.3and0.6.7predate the post-0.10 brand-aligned API surface —Resonate.remote()(used atsrc/agent.py:20) has since been removed (resonate-sdk-py/tests/test_auth.py:255-257assertsnot hasattr(Resonate, "remote")). Agents lifting this code onto a current SDK release should re-verify the constructor shape, thectx.lfcsurface, and the dependency-registry API against current SDK docs.
