Release
Why “technically correct” is the lowest form of success in the agentic era.
ByJeremy Wayland and Sidney Gathrid
Release
Every engineer who has used Claude Code or let an agent loose on their repo knows the exact flavor of dread that comes next: a notification pops up, and you are staring at an 800-line diff spanning 14 files, generated in under five minutes.
The CI pipeline is blindingly green. The micro-syntax is flawless. But as you start scrolling, the brutal reality of the agentic era hits you: we are now approving code we don't actually understand.
The cognitive load is completely asymmetrical. The agent solved the immediate ticket, sure, but it did so by hallucinating a slightly worse copy of a utility module that already existed three folders away, duct-taping a bizarre abstraction around a simple function. By line 200, your eyes glaze over. Exhausted by the sheer volume, you succumb to green-light fatigue, trust the tests, and hit Squash and Merge.
This is the hidden tax of agentic coding: we are generating code at machine speed but reviewing it at human speed. The binary pass/fail of a unit test only proves the code ran; it tells you absolutely zero about whether the agent actually grokked your system. Without structural guardrails, agentic speed is just automated entropy—a high-speed spray of perfectly functional, completely unreviewable code that slowly suffocates your project from the inside out.
Topos was built to break this cycle. It discards subjective code review in favor of structural proof. By parsing programs into their underlying mathematical graphs—control flow, module dependency, and data flow—Topos measures the true cost of a diff. It turns vague demands to “clean this up” into concrete, verifiable targets your agents can actually aim for.
Agents have made code cheap. That is useful, but it changes the bottleneck. The scarce resource is no longer a first draft of the implementation; it is judgment about whether the draft belongs in the codebase.
The default evaluation loop is still binary: tests either pass or fail. That only describes two states. Production software needs a richer vocabulary: readable but tightly coupled, secure but unmaintainable, locally correct but globally awkward, and everything in between.
“Correctness is table stakes.” Passing tests tells you the code ran. It does not tell you whether the agent understood the repo.
This is the agentic spray problem: when generation outruns codebase understanding, you get more code than clarity. Duplicate modules, fragile linkages, extra wrappers, and review cycles that burn tokens explaining context the agent should have respected. Topos gives that review a measurable object: the structure of the program itself.
Agents choke on “make it better.” They thrive on structured feedback and achievable next steps. A map of quality states—not a single pass/fail bit—is what makes the difference. That map is the quality lattice behind the medals.
Most analysis tools scan syntax—style violations, known anti-patterns, missing semicolons. Topos analyzes the shape of the program itself: its structure, independent of what language it is written in. That distinction is what makes structural medals something agents can genuinely optimize toward rather than superficially satisfy.
Time and tokens are finite. Ideal code is not always achievable in one pass—and it shouldn't have to be. Topos lets you set pillar priorities so agents pursue the best program structure you can afford without burning cycles on dimensions you don't care about yet. Those preferences induce an order on the lattice: concrete instructions for exactly how to relax, medal by medal, instead of vague “polish everything” prompts.
The framework draws on category theory—a branch of mathematics built to specify what we mean in messy, multi-dimensional domains. For readers who want the formal construction, the deep dive is below.
Topos measures code quality and turns it into something you can aim at: a Code Quality Medal per file. Concrete tiers with defined structural criteria—not a vague directive to “make it better.”
Each file is scored on three independent pillars. You can pass any combination; Gold means all three pass.
Set preferences to tell agents which medal (or pillar mix) matters most under time and token budgets. If Gold isn't reachable, they take the next-best medal on your priority list—always a defined next step, never a dead end.
Topos doesn't score characters on a page. It parses code into graph representations of program structure and runs probes on those graphs. For v1.0.0 we launch with three independent pillars; the framework extends to new ones over time.
Each pillar maps to a specific graph family:
AST + CFG
Code complexity. Cyclomatic complexity and token entropy on the abstract syntax tree and control-flow graph.
MDG
Module coupling. Martin instability and fan-out on the module dependency graph (via GitNexus).
CPG
Data-flow safety. Dangerous API reachability and taint paths from the code property graph.
These are separate dimensions of quality—like stats on a character sheet, not one blended GPA.
Three pillars expand that to eight structural quality states. Medals group them into tiers agents and humans can act on:
Topos separates structural analysis into three layers—from parsing code into graphs, to scoring a single file, to comparing versions:
How does Topos decide a probe is good enough to earn a pillar? Every score combines thresholding rules calibrated on real-world data. We ran extensive experiments against the Top 100 Python libraries on PyPI (e.g., requests, numpy, pandas) to map raw metrics to quality inflection points—for example, caps on cyclomatic complexity for SIMPLE, or a hard floor on dangerous API reachability for SECURE.
By anchoring thresholds in how top-tier open source actually looks, a Gold medal reflects engineering standards you'd recognize in production—not arbitrary limits.
Install the CLI and see medals on your repo in minutes—start building a structural record for your codebase.
# Install CLI curl -fsSL https://docs.krv.ai/topos/install.sh | sh # Evaluate a directory (preferences = pillar priority) topos evaluate src/ -r --preferences simple,composable,secure
Output is per-file pillar pass/fail plus a medal. For COMPOSABLE scores, run topos depgraph generate once per repo (GitNexus). See the Topos docs for details.
The agentic era doesn't have a code shortage. It has a judgment gap. Agents ship faster than understanding travels—and the distance between generation and comprehension is exactly where debt accumulates.
Topos closes that gap by making structure legible. Not as a linter's list of style complaints, but as a verdict: this file is structurally sound, this one is not, and here is exactly what it would take to change that.
The code your agents write today is the system your team will diagnose at 2 a.m. Measure it now.
Topos v1.0.0 is live and open for exploration. Star it on GitHub or follow @krv_labs for updates.
Open source framework for defining and optimizing code generation priorities using finite lattices.
Core concepts, API reference, and guides for managing agentic code generation quality.
J. Wayland's thesis on strategic voting under uncertainty and the commutative monoidal structure of elections, via category theory (UC Berkeley Mathematics honors thesis, 2019).
Sridhar Mahadevan's work on topos causal models: sheaves, subobject classifiers, and intuitionistic logic for specifying causal structure beyond Boolean SCMs (NeurIPS 2025).