6 min readResonate HQJust published

Durable LLM tool-call loop for a travel assistant in Python on Resonate

A multi-turn tool-using LLM agent expressed as one generator function with three branches, where every LLM call, web call, and user prompt is a Resonate durable checkpoint.

Resonate brand card on a dark background with a plum spectrum wave at the bottom and the post headline in white Sansation.

A multi-turn LLM agent that interleaves model calls, third-party API calls (Serper, Browserless), and blocking user input must survive a worker restart mid-conversation without re-paying for completed LLM calls or losing prior turns. The Resonate shape of the solution is to register the agent loop as a generator workflow and run every external interaction — LLM completion, web search, page scrape, user input — as a durable child step via ctx.lfc; each completed step's result is checkpointed in the promise store and replayed from cache after a crash. The example wires an OpenAI chat-completions tool-calling loop with two tools (internet_search, scrape_website) and a synchronous input() prompt for the human, with all four interactions running through the same durable primitive.

The shape of the solution

@resonate.register
def travel_assistent(ctx):
    messages = [
        {"role": "system", "content": "..."},  # system prompt elided
        {"role": "user", "content": "Plan a trip for me."}  # whitespace trimmed
    ]
 
    while True:
        message = yield ctx.lfc(interact_with_llm, messages)
        # Always add the assistant response
        assistant_message = {"role": "assistant", "content": message["content"]}
        if message.get("tool_calls"):
            assistant_message["tool_calls"] = [
                # ... tool_calls dict construction elided
            ]
        messages.append(assistant_message)
        content = message.get("content")
        if content and "TRIP PLANNING COMPLETE" in content:
            break
        elif message["tool_calls"]:
            for tool_call in message["tool_calls"]:
                tool_name = tool_call["name"]
                args = tool_call["args"]
                if tool_name == "internet_search":
                    result = yield ctx.lfc(search_internet, args["search_query"], args.get("num_results", 5))
                elif tool_name == "scrape_website":
                    result = yield ctx.lfc(scrape_website, args["url"])
                else:
                    result = "Unknown tool call"
                messages.append({"role": "tool", "tool_call_id": tool_call["id"], "content": result})
        elif content:
            input_message = yield ctx.lfc(chat_with_user, content)
            messages.append({"role": "user", "content": input_message})
 
    return message["content"]
# from example-ai-travel-assistant-py/src/agent.py:26-90

The orchestrator is a generator function, not async def. Every yield ctx.lfc(fn, ...) runs fn as a durable child step in the same worker process; once the child completes, its return value is persisted at the call site and replayed from the promise store on retry, so a question that the LLM has already answered is not re-asked across a worker crash.

The durable primitives in play

  • Resonate.remote() — constructs the Resonate client wired to a remote Resonate server (the worker polls the server for tasks). src/agent.py:20.
  • resonate.set_dependency(name, obj) — registers worker-process-scoped objects (the OpenAI client, the Serper API key, the Browserless API key) so child functions can retrieve them via ctx.get_dependency(...) instead of reading globals. src/agent.py:21, :22, :23.
  • @resonate.register — registers the top-level workflow under a name so the worker can claim and execute it under a caller-supplied promise id. src/agent.py:26.
  • yield ctx.lfc(fn, *args) — Local Function Call. Runs fn as a durable child step inside the same worker; its return value is checkpointed in the promise store and replayed from cache on retry. Used for the LLM call (src/agent.py:56), the Serper search (:80), the Browserless scrape (:82), and the blocking user input() (:87).
  • ctx.get_dependency(name) — retrieves a worker-scoped dependency from inside a child function. src/llm.py:6, src/tools.py:49, src/tools.py:75.
  • Function.run(id) — top-level entry point. The caller supplies the trip id (trip_id) which becomes the workflow's promise id; a second invocation with the same id returns a handle to the same promise rather than starting a parallel run. src/agent.py:98.
  • handle.result() — blocks until the durable promise resolves and returns the workflow's return value. src/agent.py:99.

What the SDK handles vs. what you write

SDK handlesYou write
Checkpointing each ctx.lfc(...) return value in the durable promise storeThe yield ctx.lfc(...) calls and the child functions (interact_with_llm, search_internet, scrape_website, chat_with_user)
Suspending the generator after each yield and resuming when the child promise resolvesThe straight-line while True: loop body — append message, branch on tool_calls vs content, break on the completion sentinel
Replaying the generator after a worker crash using cached step values rather than re-issuing the LLM, Serper, and Browserless callsNo replay code — the loop is written as if it never crashes
Identifying the workflow run by the caller-supplied trip id and deduplicating duplicate concurrent runs against the same id (returns a handle to the existing promise rather than starting a second run)The trip_id = input(...) and travel_assistent.run(trip_id) call (src/agent.py:97-98)
Routing the OpenAI client + API keys into each child function via the dependency registryThe three resonate.set_dependency(...) lines in module init (src/agent.py:21-23)
Retrying failed child functions under the same step id using the default Exponential() retry policy for non-generator functions (resonate-sdk-py/resonate/options.py:23)Error-return branches inside the tools (src/tools.py:59-60, :87-88, :89-90, :92-95) for cases the LLM should see and recover from rather than retry

The orchestrator body is one while True: with three branches (tool call, finished, ask user). The retry semantics, the cached intermediate messages state, the resume-after-crash behaviour, and the identity-based deduplication of concurrent runs all sit in the SDK + server, not in the code the author wrote.

Failure modes covered

  • Worker crashes between LLM turns. The orchestrator is registered under the caller-supplied trip_id (src/agent.py:97-98). On worker restart the workflow is replayed; every ctx.lfc call that completed and checkpointed before the crash returns its cached value from the promise store, so the OpenAI call from a prior turn is not re-issued, the Serper search is not re-charged, and the Browserless scrape is not re-fetched. Only the call that was in-flight at crash time (or had not yet checkpointed) actually re-executes.
  • A tool call (Serper or Browserless) fails transiently. search_internet returns a plain string on a missing organic key (src/tools.py:59-60); scrape_website returns "error scraping website content: ..." on non-200 (src/tools.py:87-88) and on an empty body (:89-90); the HTML parser is wrapped in try/except and returns "error parsing HTML: ..." (src/tools.py:92-95). The LLM receives the error string as the tool result and can decide to retry the call, scrape a different URL, or ask the user. The agent does not crash on a single bad tool result.
  • A tool call raises an unhandled exception. Because the tool is invoked as ctx.lfc(fn, ...), an exception is caught by the SDK rather than the workflow body. The default options apply Exponential() to non-generator functions and Never() to generator functions (resonate-sdk-py/resonate/options.py:23); since search_internet, scrape_website, and chat_with_user are plain functions, they retry under Exponential() under the same step id. After retries exhaust, the exception propagates back into the generator at the yield ctx.lfc(...) site, where the workflow body can catch it. The earlier LLM and tool results stay cached at their prior checkpoints — only the failing step retries.
  • Duplicate workflow invocation with the same trip_id. The Resonate server keys the workflow by its promise id; a second travel_assistent.run(trip_id) against the same id resolves against the existing promise rather than starting a second concurrent planning session.
  • Long human idle time. chat_with_user (src/agent.py:93-94) calls Python's blocking input(). Because it runs inside ctx.lfc, the call is a single durable step: the worker is occupied while waiting for input, but if the worker is killed and restarted, prior LLM and tool results are not lost — only this step is re-entered. (For a production human-in-the-loop swap-out the example would use a ctx.promise(...) resolved by an external HTTP gateway; the current example uses the blocking input() path.)

The example does not implement provider-side idempotency on the OpenAI, Serper, or Browserless calls — that is outside the workflow's scope and is not claimed.

When to reach for this pattern

  • If you are running a multi-turn LLM tool-calling loop where the conversation has accumulated state (paid LLM tokens, scraped pages, gathered facts) you cannot afford to redo on a crash.
  • If you want straight-line agent code — a while True: over interact_with_llm and tool dispatch — instead of writing a custom checkpoint-on-disk layer per turn.
  • If the agent interacts with third-party APIs you pay per call (LLM providers, search APIs, scraping APIs) and re-running them on retry would be wasteful or rate-limit-burning.
  • If the agent blocks on human input in any form and you want that wait to be a durable step rather than process-bound in-memory state.
  • If you want per-step retry of any external call (LLM, search, scrape, user input) without writing per-call retry decorators or try/except scaffolding.

Sources

  • Example repo: https://github.com/resonatehq-examples/example-ai-travel-assistant-py
  • Python SDK repo: https://github.com/resonatehq/resonate-sdk-py
  • Resonate documentation: https://docs.resonatehq.io
  • Files cited in this post:
    • src/agent.py:20-23 — Resonate client construction and dependency wiring
    • src/agent.py:26-90 — the travel_assistent generator workflow
    • src/agent.py:93-94chat_with_user (blocking input() step)
    • src/agent.py:96-101main() entry point (travel_assistent.run(trip_id), handle.result())
    • src/llm.py:4-32interact_with_llm OpenAI chat-completions call
    • src/tools.py:6-45 — OpenAI tools schema
    • src/tools.py:48-70search_internet (Serper)
    • src/tools.py:73-100scrape_website (Browserless + unstructured HTML partition)
    • resonate-sdk-py/resonate/options.py:23 — default retry policy (Exponential() for non-generator, Never() for generator functions)
    • pyproject.toml:10 — declares resonate-sdk>=0.6.7
    • uv.lock — records its own specifier = ">=0.5.3" for resonate-sdk and pins version = "0.5.3"; the committed lockfile was generated against a looser constraint than the current pyproject.toml. Both 0.5.3 and 0.6.7 predate the post-0.10 brand-aligned API surface — Resonate.remote() (used at src/agent.py:20) has since been removed (resonate-sdk-py/tests/test_auth.py:255-257 asserts not hasattr(Resonate, "remote")). Agents lifting this code onto a current SDK release should re-verify the constructor shape, the ctx.lfc surface, and the dependency-registry API against current SDK docs.