← Slava Rudenko

Cutting per-query cost from $7 to $0.20

At Davidovs VC I built an agentic platform that reads startup pitch decks and writes investment memos. It ran the first-pass evaluation for a team looking at hundreds of companies a year. The contract ran from September 2025 to March 2026.

The first version worked. It was also expensive. Every query ran on the frontier model, top to bottom. About seven to ten dollars a call.

At the volume we processed, that adds up. Enough to notice on the bill. Enough that cost started shaping what I was willing to let the system do.

The fix was not a better model. It was routing.

Here is the rule I landed on. Cheap models handle drafts. Mid-tier models handle reasoning. Frontier models handle the calls that matter.

Most of a memo is not hard. Pull the numbers out of the deck. Restate the ask. Summarize what the company does. A cheap model does that fine, and I can check it in a glance.

The hard part is judgment. Is this a real market. Does the traction hold up under a second look. What is the risk nobody in the deck is naming. That work gets the expensive model, because that is where a wrong answer costs something.

So I tiered it. Each step routes to the cheapest model that can do it well. The frontier model only fires on the steps that actually need it.

Per-query cost dropped from about seven dollars to twenty cents. Roughly ninety-seven percent off, with no drop in the quality of the calls that mattered.

Here is the part I care about. Per-request unit economics is a product decision, not an infra detail.

If every call costs seven dollars, you build a different product than if it costs twenty cents. You gate features. You batch requests. You make users wait, or ask them to confirm before the expensive thing runs. The cost leaks into the roadmap and starts making decisions for you.

At twenty cents a call, that pressure goes away. You let the system run on everything. You stop rationing. The cost stops shaping the product.

The routing layer is where that decision lives. Whoever owns the routing owns the margin. It is not plumbing you set up once and forget. It is the dial that decides what kind of product you can afford to ship.

So I treat cost per request as a first-class metric, right next to latency and quality. If I can not see what a single call cost, I can not route on it, and I am flying blind on the one number that decides the unit economics of the whole thing.

Build the routing layer early. Measure per-call cost from day one. It is cheaper than rebuilding the product later around a bill you did not see coming.