Splitting the Pipeline: How We Made an AI Feed Bot Feel Fast

This continues the AI news bot series. Short version: Harry (my coding agent) and I built a feed that curates AI news from 27 sources, ranks it with an LLM, and publishes to a web app.

The feed was running three times a day. 9 AM, noon, 7 PM. The output was good when it landed.

The problem: between runs, the feed felt dead.

AI moves fast. Papers drop, models ship, blog posts go live — and my feed just sat there showing the same 30 items for hours. By the time the next run fired, some of the “breaking” news was already old.

The obvious fix is to run more often. But that creates a cascade of engineering problems.

Why “just run more often” doesn’t work

Each full run does a lot:

Crawl 27 sources — RSS feeds, sitemaps, API endpoints
Fetch article content — full-text extraction for better summaries
LLM processing — Claude classifies, scores, and generates summaries for each item
Ranking — slot-based algorithm picks the top items
Publish — commit artifacts to GitHub, Vercel auto-deploys

Running this every 30 minutes means:

LLM costs multiply — each run burns tokens on classification and summarization
Source pressure — hammering RSS feeds and sitemaps 48 times a day is rude (and some will rate-limit you)
Snapshot bloat — every run creates runtime artifacts in the repo; more runs = faster repo growth
Diminishing returns — if nothing changed in the last 30 minutes, you’re paying to process the same content again

Here’s what I actually sent Harry:

Today I observed something that can improve UX. Three times a day is too slow update, but at the same time if we simply reduce the schedule then we will have too many snapshots stored in the GitHub and also since we are integrating with LLM we will have rate limit issue too. Also we are handling a same batch as a run id, but I am not sure we can maintain that since we will have many runs. Also we should avoid crawling too often to the sources.

Do you understand the engineering challenges and UX gap?

Harry got it immediately:

100% — you’re describing the exact core tension: UX wants fast freshness, infra wants controlled cost/noise. And yes, the current “full run = snapshot + LLM + crawl” model won’t scale well if you increase frequency.

He proposed a detailed 5-point plan. But it was too much at once. I pushed back:

I think this is a bit too much change at one time.

Then I had a clearer idea for how to structure it:

I have one good idea. What if we separate the data level into two tiers and keep the current one as Tier 0 (slow but more decorated data such as LLM labeling and reranking) and create new Tier 1 which has short ingestion time with less post processing? Tier 0 will not directly consume articles from the web but use Tier 1 data as the source of truth.

Harry’s response:

That’s a very strong idea — and cleaner than doing many ad-hoc tweaks.

The two-tier design

The architecture direction was clear: split the pipeline into two tiers with different speeds and costs. I defined the tier responsibilities and the data flow between them. Harry turned it into working code.

Tier-1 (fast lane) — runs every 30 minutes:

Collects raw data from sources (with cooldown — if a source was fetched recently, reuse the cached version)
Applies lightweight, deterministic scoring (no LLM)
Produces a “quick score” based on source reputation, recency, keyword signals
Result: a fresh-ish snapshot of what’s out there, updated frequently, at near-zero cost

Tier-0 (deep lane) — runs 3 times a day:

Takes Tier-1’s output as input (no duplicate crawling)
Runs full LLM classification, summary generation, and reranking
Produces the curated, high-quality feed
Result: the “good” output, less frequent but much smarter

The feed API then blends both: Tier-0’s curated items form the backbone, with Tier-1’s fresh items sprinkled in at the top. Fresh items get a badge so users know they’re preliminary scores, not fully curated.

Making it not wasteful

The two-tier split was the big idea. But making it actually efficient required several smaller decisions:

Collector cooldown

Sources have a configurable cooldown period. If a source was fetched less than N minutes ago, the fast lane reuses the previous raw data instead of hitting the source again. This means 48 Tier-1 runs per day don’t mean 48 crawls — most runs reuse cached data and only re-fetch sources whose cooldown expired.

No-delta skip

Before running the expensive Tier-0 pipeline, we check: did the input actually change since the last full run? If Tier-1 produced the same set of items (no new content from any source), the full run skips entirely. No LLM calls, no new artifacts, no wasted money.

Retention compaction

More runs means more snapshots stored in the repo. We added a compaction policy:

Tier-1 snapshots: keep full detail for 3 days, then compact to one-per-day
Tier-0 snapshots: keep full detail for 7 days, then compact

This keeps the repo lean without losing historical data for debugging.

Deterministic flow order

Full runs now enforce a strict sequence: collect → build Tier-1 → run Tier-0 (using Tier-1 as input). No parallel paths, no ambiguity about what data feeds what.

The operational layer

Building it is one thing. Knowing if it’s working is another.

We added a daily ops summary — a single command that reports:

How many Tier-0 and Tier-1 runs happened in the last 24 hours
Latest item counts from each tier
Ingest status distribution (how many sources succeeded, how many hit cooldown, how many errored)

This takes the two-tier system from “I think it’s working” to “I can see it’s working.”

Checking our work

At the end of the day, I pulled up my original message and asked Harry: “Have we achieved this goal?”

His checklist:

“3 times/day feels too slow” → ✅ Tier-1 fast lane every 30 min + feed freshness blend

“More runs will bloat GitHub snapshots” → ✅ Retention/compaction policy

“LLM rate limits/cost with higher frequency” → ✅ Tier-1 is lightweight (no LLM), Tier-0 has no-delta skip

“Run-id semantics with many runs” → ✅ Added ingest_batch_id lineage

“Avoid crawling too often” → ✅ Collector cooldown + reuse behavior

Every concern from the morning had a concrete solution by evening. That’s a satisfying day.

What actually changed for users

Before: feed updates 3x/day. Between updates, stale content.

After: fresh items appear within 30 minutes of being published by sources. They’re scored with a lightweight algorithm, not LLM-curated, but they’re there. When the full run fires, they get properly classified and reranked.

The LLM budget stayed roughly the same. Source crawling actually decreased (cooldown means fewer total fetches). The repo stopped growing as fast (compaction). And the feed feels alive.

The pattern

This is a pattern I keep seeing with AI-powered products: the first version works but has an obvious UX gap. The fix isn’t more compute — it’s smarter architecture.

The instinct is always “run it more.” The answer is usually “run different things at different speeds.”

Tier-1 handles freshness. Tier-0 handles quality. Neither could do both well on its own. Together, they cover the full experience — and the total cost barely moved.

The feed is live at llm-digest.com. If you’re curious about the earlier chapters: how we built it, how we scaled it, teaching it to read, and accumulating intelligence.