The Token Dilemma: Why AI Security Scales on Architecture, Not Model Pricing

Last updated: June 10, 2026 | 7 MIN

Co-Founder & CTO

A few weeks ago I wrote about why AI models getting dramatically better at finding vulnerabilities actually makes life harder for application security teams, not easier. One thread I want to pull on now: the economics of how you actually deploy that capability across an enterprise codebase.

The answer most vendors are quietly giving—we run our model on every commit, every PR, every branch—falls apart the moment you do the math. I call this the token dilemma. It’s the constraint that’s quietly shaping every serious AI security architecture conversation I’m in.

The math nobody wants to do out loud

Inference costs have dropped dramatically over the last two years, and they’ll keep dropping. It’s tempting to assume the curve continues until “scan everything with AI” becomes affordable.

It won’t.

The pricing spread between the cheapest and most capable models is widening, not narrowing, which means choosing where to spend the expensive scan matters more than ever, not less. And reasoning over an entire codebase on every PR will always cost orders of magnitude more than running a regex. That’s a structural difference between pattern matching and reasoning. No amount of cheaper inference closes it.

For a small startup, the bill is annoying but tolerable. For an enterprise with hundreds of millions of lines of code across thousands of repos and tens of thousands of PRs a week, “deep AI scan on every commit” is a number that ends conversations with finance. Not because security doesn’t care. Because the math doesn’t work.

So now you have a new question: which slice of your codebase, or your PRs, or your services, deserves the deep scan? That question is itself a security decision. Get it wrong and you’re paying premium prices to scan the wrong things while the real risks ship through cheaper layers that aren’t designed to catch them.

Why the obvious answers don’t work

I’ve watched teams try three approaches. None of them work cleanly.

Approach #1: Scan high-risk services.

Sounds clean. Falls apart in practice. Risk isn’t static. A microservice that handled non-sensitive data last quarter might be touching payment flows this quarter because someone refactored. The repo nobody touched in two years just got a major dependency update that changed its threat model overnight. Static priority lists go stale fast, and the moment they do, your AI budget is being spent on yesterday’s risk profile.

Approach #2: Scan suspicious diffs.

Better, but it begs the question: who decides what’s suspicious? If you’re using a heuristic, you’ve moved the prioritization problem one layer up, and the heuristic is the new attack surface. If you’re using a separate AI to triage which diffs deserve the expensive AI, you’re paying for two AIs instead of one.

Approach #3: Let developers tag what’s important.

Don’t. Developers are not the right people to estimate the security risk of their own code, and they don’t have time anyway. Anyone who has watched a developer triage their own AI-generated output already knows where this leads. Adding “manually rate your security risk” to that workflow is not how this ends well.

What you actually need is a system that decides where to spend the expensive scan based on signals the deterministic layer is already producing: change patterns, dependency shifts, exposure changes, blast radius. Not developer self-reporting. Not stale priority lists. Not a second AI guessing what the first one should look at.

How we think about it at Cycode

The token dilemma isn’t a problem you solve by being clever about prompts or model selection. You solve it with architecture.

The bottom layer is deterministic, fast, and always on.

SAST, SCA, secrets, IaC, policy. Cheap enough to run on every commit, fast enough to give inline feedback (to developers when they’re driving, to AI coding agents via MCP when they are), comprehensive enough to catch the entire pattern-based attack surface.

This isn’t a fallback. It’s the load-bearing layer that makes everything above it economically viable, and it’s the layer that increasingly turns vulnerabilities into auto-fixes before a human ever sees them.

The top layer is AI-driven analysis, and it runs selectively.

You scan deeply when the signals say it’s worth it: meaningful changes to high-blast-radius services, anomalies the deterministic layer surfaces but can’t fully resolve, release candidates heading to production, code paths touching sensitive data flows. The expensive scan is reserved for cases where its marginal detection power justifies the cost.

The connective tissue is the platform.

Something has to decide, in real time, which findings need the deeper look. That decision is itself part of the product. It uses context: change graph, ownership data, exploitability signals, deployment topology. It treats every AI scan as a budgeted resource, not a free action. And it gets better over time, because every scan it runs is feedback on what the deeper analysis surfaces versus what the deterministic layer already caught.

Why this gets worse before it gets better

The token dilemma intensifies as AI-generated code volume grows. Code volume is going up faster than inference costs are coming down, which means the gap between “scan everything with AI” and “scan economically with AI” is widening, not closing.

Organizations that figure out the routing problem early will spend their AI budget on the right slice of their codebase. The ones that don’t will burn through their budget on noise and end up with worse coverage than the deterministic layer alone would have given them.

The takeaway

The token dilemma isn’t a temporary constraint waiting for cheaper models to make it disappear. It’s a permanent feature of how AI scanning gets deployed at any real codebase scale.

Treating it as a constraint to design around, rather than a problem to throw more inference at, is the difference between an AppSec program that scales and one that doesn’t.

At Cycode, deterministic scanning is the always-on first layer. AI analysis is the targeted second layer. A routing system decides where the expensive scan earns its cost. The point isn’t that we have AI in our product. Everyone has AI in their product. The point is that the deterministic and AI layers act as a single coordinated system, where each one does what it’s structurally best at, and neither is asked to do work the other can do more cheaply.

Cleaner code at 100x volume needs detection at 100x scale. The only way to get there is by being honest about what AI is actually for. And what it isn’t.

Want to see how Cycode’s two-layer architecture works in practice? Cycode’s Agentic Development Security Platform combines always-on deterministic scanning with AI-powered analysis, routed by context, not guesswork. Learn more at cycode.com

Originally published: June 10, 2026

Listen to the Blog Post

00:00 / 00:00

Introduction
The Math Nobody Wants to Do Out Loud
Why the Obvious Answers Don't Work
How We Think About It at Cycode
Why This Gets Worse Before It Gets Better
The Takeaway