A long-running monitor that polls an external API, scores each result with an LLM, and emits notifications must not re-analyze stories after a crash and must not collapse to an immediate redundant scan when restarted mid-interval. The Resonate shape of the solution is a generator workflow where every yield ctx.run(...) is a durable checkpoint and yield ctx.sleep(...) is a durable timer, so the dedup set rebuilds from cached step results on replay and the inter-round interval survives restarts. The example registers two generator functions — scan_keyword (one round for one keyword) and monitor_hackernews (the infinite loop that owns the dedup set) — and uses Resonate dependencies to inject the OpenAI client and config into worker functions.
The shape of the solution
@resonate.register
def monitor_hackernews(ctx: Context):
"""
Continuous monitoring loop.
Owns the `seen_ids` set. On crash-recovery Resonate replays this generator
and returns cached results for completed `scan_keyword` calls, so `seen_ids`
rebuilds deterministically — the promise store IS the state store.
`yield ctx.sleep(...)` between rounds is a durable timer: a restart during
sleep resumes the sleep rather than triggering an immediate redundant scan.
"""
config: AgentConfig = ctx.get_dependency("config")
keywords = config["keywords"]
scan_interval_secs = config["scan_interval_secs"]
# ...
seen_ids: set[str] = set()
while True:
for keyword in keywords:
try:
result = yield ctx.run(scan_keyword, keyword, list(seen_ids))
for a in result["newly_analyzed"]:
seen_ids.add(a["story_id"])
except Exception as e:
print(f"❌ Error scanning '{keyword}': {e}")
yield ctx.sleep(scan_interval_secs)
# from example-hackernews-research-agent-py/src/agent.py:220-253The inner scan_keyword workflow is the same shape — three yield ctx.run(...) calls (fetch, per-story analyze, notify) under @resonate.register:
@resonate.register
def scan_keyword(
ctx: Context,
keyword: str,
seen_ids: Optional[list[str]] = None,
):
# ...
config: AgentConfig = ctx.get_dependency("config")
relevance_threshold = config["relevance_threshold"]
stories = yield ctx.run(search_hackernews, keyword)
seen = set(seen_ids or [])
new_stories = [s for s in stories if s["objectID"] not in seen]
# ...
newly_analyzed = []
for story in new_stories:
analysis = yield ctx.run(analyze_story, story, keyword)
newly_analyzed.append(analysis)
interesting = [
a for a in newly_analyzed
if a["is_interesting"] and a["relevance_score"] >= relevance_threshold
]
yield ctx.run(notify_findings, interesting, keyword)
# ...
return {
"keyword": keyword,
"stories_found": len(stories),
"newly_analyzed": newly_analyzed,
}
# from example-hackernews-research-agent-py/src/agent.py:172-217Both workflows are plain Python generator functions, not async def. Each yield ctx.run(...) runs a child step under a durable promise and suspends the generator until that step's result is checkpointed.
The durable primitives in play
Resonate()— constructs the Resonate client embedded in the worker process.src/agent.py:25.@resonate.register— registers a generator function as a top-level workflow the worker can claim and execute under a caller-supplied promise id. Applied toscan_keywordatsrc/agent.py:172andmonitor_hackernewsatsrc/agent.py:220.ctx.run(fn, *args)— runsfnas a durable child step. The return value is persisted; on replay the SDK returns the cached value rather than re-invoking the function. Used atsrc/agent.py:192(fetch),:201(per-story analyze),:209(notify), and:247(parent invokes the registeredscan_keywordworkflow as a step).ctx.sleep(seconds)— durable timer. The pending wake-up is stored on the server, so a worker restart during the wait resumes the remainder of the sleep instead of restarting it from zero.src/agent.py:253.resonate.set_dependency(name, value)— registers a worker-process value (OpenAI client, config dict) under a string key so durable functions retrieve it fromctxinstead of closing over module-level state.src/agent.py:273-274.ctx.get_dependency(name)— fetches a registered dependency inside a workflow or step function.src/agent.py:58(openaiinsideanalyze_story),:117(configinsidenotify_findings),:189(configinsidescan_keyword),:232(configinsidemonitor_hackernews).resonate.start()/resonate.stop()— start and stop the worker's polling loop against the Resonate server.src/agent.py:284,:290.
What the SDK handles vs. what you write
| SDK handles | You write |
|---|---|
Checkpointing each ctx.run(...) return value in the durable promise store | The four yield ctx.run(...) calls (search_hackernews, analyze_story, notify_findings, scan_keyword) and the step function bodies |
Replaying the generator after a crash and returning cached results for completed steps, so seen_ids rebuilds in the same order | A plain Python set[str] named seen_ids, mutated normally inside monitor_hackernews (src/agent.py:242, 249) |
| Holding the inter-round wait as a durable timer that survives restarts | A single yield ctx.sleep(scan_interval_secs) (src/agent.py:253) |
| Routing the OpenAI client and config into each function call via the dependency registry | Two resonate.set_dependency(...) lines in main() (src/agent.py:273-274) and ctx.get_dependency(...) lookups in each function |
Holding the workflow's identity under the caller-supplied promise id (monitor-1, scan-1) so a re-invoke resolves the existing promise rather than starting a duplicate | The resonate invoke <id> --func <fn> commands in the README (README.md:77, :83) |
| Walking the durable log on restart to rebuild generator state | No replay code — the workflow is written as if it never crashes |
The author writes a while True loop, a for loop over keywords, a Python set, and a time.sleep-shaped ctx.sleep. Everything that makes the workflow durable — the checkpoints, the replay, the timer, the dependency wiring — is in the SDK and server.
Failure modes covered
- Worker crash mid-scan. Each completed step's return value lives in the promise store; on restart the generator replays and the SDK returns cached results for the steps that already finished. Only the unfinished step re-enters its body. Code:
src/agent.py:192, 201, 209. README:README.md:17-19. - Worker crash mid-interval.
yield ctx.sleep(scan_interval_secs)(src/agent.py:253) is a durable timer. A restart during the wait resumes the remaining sleep rather than triggering an immediate redundant scan. README:README.md:25. - Re-processing a story across rounds.
seen_idsis aset[str]local tomonitor_hackernews(src/agent.py:242). After eachscan_keywordcall, everystory_idfromresult["newly_analyzed"]is added (src/agent.py:248-249). On replay, the cachedresultdicts come back from the promise store and the same IDs land inseen_idsin the same order. The nextscan_keywordcall receiveslist(seen_ids)and filters fetched stories atsrc/agent.py:194. No external dedup database. - One keyword's scan raises.
monitor_hackernewswraps the per-keywordctx.run(scan_keyword, ...)intry/except Exception(src/agent.py:246-251) and continues to the next keyword. The inter-roundctx.sleepstill runs. - Re-invoking with the same promise id.
resonate invoke monitor-1 --func monitor_hackernews(README.md:83) supplies the promise id at the call site. A second invoke with the same id resolves against the existing promise instead of starting a parallel run.
The example does not deduplicate against OpenAI or Slack. The README is explicit at README.md:125-127 that Slack delivery is at-least-once — notify_findings is one ctx.run, so a retry can re-POST. An analyze_story retry re-issues the OpenAI request. Both are accepted trade-offs, not bugs. The README also flags at README.md:118-123 that seen_ids lives in worker memory and grows unbounded; Resonate replay cost on restart grows with it, so production monitors should bound the window or snapshot externally. Finally, pyproject.toml:11 and uv.lock pin resonate-sdk at 0.6.7 — on the 0.6.x Python SDK line. The TypeScript and Rust SDKs underwent post-0.10 / post-0.4 brand-aligned API renames; agents mapping this Python pattern onto a newer SDK release should re-verify the surface of @resonate.register, ctx.run, ctx.sleep, ctx.get_dependency, and resonate.set_dependency against current Python SDK docs before assuming the call shapes carry over unchanged.
When to reach for this pattern
- If you are running a continuous monitor or polling agent that scores each result with an LLM and must survive worker restarts without re-scoring previously-seen items.
- If you want per-step retry on a long-running loop without retry decorators or external cursor storage — cached step results plus a local Python collection are enough.
- If the interval between rounds matters (rate limits, cost) and a restart mid-wait must not collapse the interval —
ctx.sleepis the durable timer. - If you want CLI / RFI (Remote Function Invocation) invocability of the inner scan unit on top of an infinite outer loop — both
scan_keywordandmonitor_hackernewsare@resonate.register-ed, andscan_keyword's optionalseen_idsarg makes it standalone-callable viaresonate invoke ... --func scan_keyword --arg "<keyword>"(README.md:77). - If you need clean separation between ephemeral (OpenAI client, config) and durable state so the workflow stays free of unserializable closures —
resonate.set_dependency/ctx.get_dependencyis the seam.
Sources
- Example repo: https://github.com/resonatehq-examples/example-hackernews-research-agent-py
- Python SDK repo: https://github.com/resonatehq/resonate-sdk-py
- Resonate server: https://github.com/resonatehq/resonate
- Resonate documentation: https://docs.resonatehq.io
- Files cited in this post:
src/agent.py:172-217—scan_keywordgenerator workflowsrc/agent.py:220-253—monitor_hackernewsgenerator workflow (ownsseen_ids, durable sleep)src/agent.py:40-49—search_hackernewsstepsrc/agent.py:57-108—analyze_storystep (OpenAI client viactx.get_dependency("openai"))src/agent.py:116-164—notify_findingsstep (console + optional Slack POST)src/agent.py:261-290—main()(dependency wiring,resonate.start()/resonate.stop())pyproject.toml:11,uv.lock— SDK pinresonate-sdk==0.6.7README.md:17-19, 25, 96-98, 118-127— author's framing of replay, durable sleep, dedup, and known limits
