May 25, 20265 min readResonate HQ

Durable Hacker News research agent in Python on Resonate

How a long-running keyword-monitoring agent collapses to a generator workflow when every yield is a durable checkpoint and the promise store IS the state store.

python agent durable-sleep infinite-workflow for-agents

A long-running monitor that polls an external API, scores each result with an LLM, and emits notifications must not re-analyze stories after a crash and must not collapse to an immediate redundant scan when restarted mid-interval. The Resonate shape of the solution is a generator workflow where every yield ctx.run(...) is a durable checkpoint and yield ctx.sleep(...) is a durable timer, so the dedup set rebuilds from cached step results on replay and the inter-round interval survives restarts. The example registers two generator functions — scan_keyword (one round for one keyword) and monitor_hackernews (the infinite loop that owns the dedup set) — and uses Resonate dependencies to inject the OpenAI client and config into worker functions.

The shape of the solution

@resonate.register
def monitor_hackernews(ctx: Context):
    """
    Continuous monitoring loop.
 
    Owns the `seen_ids` set. On crash-recovery Resonate replays this generator
    and returns cached results for completed `scan_keyword` calls, so `seen_ids`
    rebuilds deterministically — the promise store IS the state store.
 
    `yield ctx.sleep(...)` between rounds is a durable timer: a restart during
    sleep resumes the sleep rather than triggering an immediate redundant scan.
    """
    config: AgentConfig = ctx.get_dependency("config")
    keywords = config["keywords"]
    scan_interval_secs = config["scan_interval_secs"]
    # ...
 
    seen_ids: set[str] = set()
 
    while True:
        for keyword in keywords:
            try:
                result = yield ctx.run(scan_keyword, keyword, list(seen_ids))
                for a in result["newly_analyzed"]:
                    seen_ids.add(a["story_id"])
            except Exception as e:
                print(f"❌ Error scanning '{keyword}': {e}")
 
        yield ctx.sleep(scan_interval_secs)
# from example-hackernews-research-agent-py/src/agent.py:220-253

The inner scan_keyword workflow is the same shape — three yield ctx.run(...) calls (fetch, per-story analyze, notify) under @resonate.register:

@resonate.register
def scan_keyword(
    ctx: Context,
    keyword: str,
    seen_ids: Optional[list[str]] = None,
):
    # ...
    config: AgentConfig = ctx.get_dependency("config")
    relevance_threshold = config["relevance_threshold"]
 
    stories = yield ctx.run(search_hackernews, keyword)
    seen = set(seen_ids or [])
    new_stories = [s for s in stories if s["objectID"] not in seen]
 
    # ...
 
    newly_analyzed = []
    for story in new_stories:
        analysis = yield ctx.run(analyze_story, story, keyword)
        newly_analyzed.append(analysis)
 
    interesting = [
        a for a in newly_analyzed
        if a["is_interesting"] and a["relevance_score"] >= relevance_threshold
    ]
 
    yield ctx.run(notify_findings, interesting, keyword)
 
    # ...
 
    return {
        "keyword": keyword,
        "stories_found": len(stories),
        "newly_analyzed": newly_analyzed,
    }
# from example-hackernews-research-agent-py/src/agent.py:172-217

Both workflows are plain Python generator functions, not async def. Each yield ctx.run(...) runs a child step under a durable promise and suspends the generator until that step's result is checkpointed.

The durable primitives in play

Resonate() — constructs the Resonate client embedded in the worker process. src/agent.py:25.
@resonate.register — registers a generator function as a top-level workflow the worker can claim and execute under a caller-supplied promise id. Applied to scan_keyword at src/agent.py:172 and monitor_hackernews at src/agent.py:220.
ctx.run(fn, *args) — runs fn as a durable child step. The return value is persisted; on replay the SDK returns the cached value rather than re-invoking the function. Used at src/agent.py:192 (fetch), :201 (per-story analyze), :209 (notify), and :247 (parent invokes the registered scan_keyword workflow as a step).
ctx.sleep(seconds) — durable timer. The pending wake-up is stored on the server, so a worker restart during the wait resumes the remainder of the sleep instead of restarting it from zero. src/agent.py:253.
resonate.set_dependency(name, value) — registers a worker-process value (OpenAI client, config dict) under a string key so durable functions retrieve it from ctx instead of closing over module-level state. src/agent.py:273-274.
ctx.get_dependency(name) — fetches a registered dependency inside a workflow or step function. src/agent.py:58 (openai inside analyze_story), :117 (config inside notify_findings), :189 (config inside scan_keyword), :232 (config inside monitor_hackernews).
resonate.start() / resonate.stop() — start and stop the worker's polling loop against the Resonate server. src/agent.py:284, :290.

What the SDK handles vs. what you write

SDK handles	You write
Checkpointing each `ctx.run(...)` return value in the durable promise store	The four `yield ctx.run(...)` calls (`search_hackernews`, `analyze_story`, `notify_findings`, `scan_keyword`) and the step function bodies
Replaying the generator after a crash and returning cached results for completed steps, so `seen_ids` rebuilds in the same order	A plain Python `set[str]` named `seen_ids`, mutated normally inside `monitor_hackernews` (`src/agent.py:242, 249`)
Holding the inter-round wait as a durable timer that survives restarts	A single `yield ctx.sleep(scan_interval_secs)` (`src/agent.py:253`)
Routing the OpenAI client and config into each function call via the dependency registry	Two `resonate.set_dependency(...)` lines in `main()` (`src/agent.py:273-274`) and `ctx.get_dependency(...)` lookups in each function
Holding the workflow's identity under the caller-supplied promise id (`monitor-1`, `scan-1`) so a re-invoke resolves the existing promise rather than starting a duplicate	The `resonate invoke <id> --func <fn>` commands in the README (`README.md:77`, `:83`)
Walking the durable log on restart to rebuild generator state	No replay code — the workflow is written as if it never crashes

The author writes a while True loop, a for loop over keywords, a Python set, and a time.sleep-shaped ctx.sleep. Everything that makes the workflow durable — the checkpoints, the replay, the timer, the dependency wiring — is in the SDK and server.

Failure modes covered

Worker crash mid-scan. Each completed step's return value lives in the promise store; on restart the generator replays and the SDK returns cached results for the steps that already finished. Only the unfinished step re-enters its body. Code: src/agent.py:192, 201, 209. README: README.md:17-19.
Worker crash mid-interval. yield ctx.sleep(scan_interval_secs) (src/agent.py:253) is a durable timer. A restart during the wait resumes the remaining sleep rather than triggering an immediate redundant scan. README: README.md:25.
Re-processing a story across rounds. seen_ids is a set[str] local to monitor_hackernews (src/agent.py:242). After each scan_keyword call, every story_id from result["newly_analyzed"] is added (src/agent.py:248-249). On replay, the cached result dicts come back from the promise store and the same IDs land in seen_ids in the same order. The next scan_keyword call receives list(seen_ids) and filters fetched stories at src/agent.py:194. No external dedup database.
One keyword's scan raises. monitor_hackernews wraps the per-keyword ctx.run(scan_keyword, ...) in try/except Exception (src/agent.py:246-251) and continues to the next keyword. The inter-round ctx.sleep still runs.
Re-invoking with the same promise id. resonate invoke monitor-1 --func monitor_hackernews (README.md:83) supplies the promise id at the call site. A second invoke with the same id resolves against the existing promise instead of starting a parallel run.

The example does not deduplicate against OpenAI or Slack. The README is explicit at README.md:125-127 that Slack delivery is at-least-once — notify_findings is one ctx.run, so a retry can re-POST. An analyze_story retry re-issues the OpenAI request. Both are accepted trade-offs, not bugs. The README also flags at README.md:118-123 that seen_ids lives in worker memory and grows unbounded; Resonate replay cost on restart grows with it, so production monitors should bound the window or snapshot externally. Finally, pyproject.toml:11 and uv.lock pin resonate-sdk at 0.6.7 — on the 0.6.x Python SDK line. The TypeScript and Rust SDKs underwent post-0.10 / post-0.4 brand-aligned API renames; agents mapping this Python pattern onto a newer SDK release should re-verify the surface of @resonate.register, ctx.run, ctx.sleep, ctx.get_dependency, and resonate.set_dependency against current Python SDK docs before assuming the call shapes carry over unchanged.

When to reach for this pattern

If you are running a continuous monitor or polling agent that scores each result with an LLM and must survive worker restarts without re-scoring previously-seen items.
If you want per-step retry on a long-running loop without retry decorators or external cursor storage — cached step results plus a local Python collection are enough.
If the interval between rounds matters (rate limits, cost) and a restart mid-wait must not collapse the interval — ctx.sleep is the durable timer.
If you want CLI / RFI (Remote Function Invocation) invocability of the inner scan unit on top of an infinite outer loop — both scan_keyword and monitor_hackernews are @resonate.register-ed, and scan_keyword's optional seen_ids arg makes it standalone-callable via resonate invoke ... --func scan_keyword --arg "<keyword>" (README.md:77).
If you need clean separation between ephemeral (OpenAI client, config) and durable state so the workflow stays free of unserializable closures — resonate.set_dependency / ctx.get_dependency is the seam.

Sources

Example repo: https://github.com/resonatehq-examples/example-hackernews-research-agent-py
Python SDK repo: https://github.com/resonatehq/resonate-sdk-py
Resonate server: https://github.com/resonatehq/resonate
Resonate documentation: https://docs.resonatehq.io
Files cited in this post:
- src/agent.py:172-217 — scan_keyword generator workflow
- src/agent.py:220-253 — monitor_hackernews generator workflow (owns seen_ids, durable sleep)
- src/agent.py:40-49 — search_hackernews step
- src/agent.py:57-108 — analyze_story step (OpenAI client via ctx.get_dependency("openai"))
- src/agent.py:116-164 — notify_findings step (console + optional Slack POST)
- src/agent.py:261-290 — main() (dependency wiring, resonate.start() / resonate.stop())
- pyproject.toml:11, uv.lock — SDK pin resonate-sdk==0.6.7
- README.md:17-19, 25, 96-98, 118-127 — author's framing of replay, durable sleep, dedup, and known limits