Introducing Topos: Quality is the New Currency

TL;DR

The Problem: Agents generate code faster than humans can review it. Binary pass/fail tests prove the code ran—they don't prove it belongs.
The Threat: The agentic spray problem —without structural guardrails, agentic speed is automated entropy: duplicate modules, fragile abstractions, and review fatigue that merges things no one actually understands.
What Topos Does: Parses code into graph representations and scores three independent structural pillars, awarding a Quality Medal per file: Bronze, Silver, Gold, or Slop.
Why It Matters: You cannot prompt away bad architecture. Concrete structural targets let agents build toward maintainability—not just a green test suite.
The Name: A topos generalizes Set to a category with its own internal logic. Where Set uses {0,1} as its subobject classifier, ours is the Heyting algebra freely generated by {Simple, Composable, Secure}—structural qualities of programs represented as graphs—whose elements are exactly the medal states agents navigate.

Krv-Labs/topos is open source Star — or jump straight to installation.

A New Breed of Fatigue

Every engineer who has used Claude Code or let an agent loose on their repo knows the exact flavor of dread that comes next: a notification pops up, and you are staring at an 800-line diff spanning 14 files, generated in under five minutes.

The CI pipeline is blindingly green. The micro-syntax is flawless. But as you start scrolling, reality hits you: we are now approving code we don't actually understand.

The cognitive load is asymmetrical. The agent solved the immediate ticket, sure, but it did so by hallucinating a slightly worse copy of a utility module that already existed three folders away, duct-taping a bizarre abstraction around a simple function. By line 200, your eyes glaze over. Exhausted by the sheer volume, you succumb to green-light fatigue, trust the tests, and hit Squash and Merge.

“We are generating code at machine speed but reviewing it at human speed.”

The binary pass/fail of a unit test only proves the code ran; it tells you absolutely nothing about whether the agent actually grokked your system. Without structural guardrails, agentic speed is automated entropy: more code, less clarity, until review collapses under its own weight.

Topos was built to break this cycle. Instead of reviewing 60-file diffs line by line, you review structural properties—and how they evolve. We've already changed how we write code. It's time to change how we review it. By parsing programs into their underlying mathematical graphs—control flow, module dependency, and data flow—Topos measures the true cost of a diff. It turns vague demands to “clean this up” into concrete, verifiable targets your agents can actually aim for.

Beyond Pass/Fail

Agents have made code cheap. That is useful, but it changes the bottleneck. The scarce resource is no longer a first draft of the implementation; it is judgment about whether the draft belongs in the codebase.

The default evaluation loop is still binary: tests either pass or fail. That only describes two states. Production software needs a richer vocabulary: readable but tightly coupled, secure but unmaintainable, locally correct but globally awkward, and everything in between.

“Correctness is table stakes.” Passing tests tells you the code ran. It does not tell you whether the agent understood the repo.

This is the agentic spray problem: when generation outruns codebase understanding, you get more code than clarity. Duplicate modules, fragile linkages, extra wrappers, and review cycles that burn tokens explaining context the agent should have respected. Topos gives that review a measurable object: the structure of the program itself.

How Topos Works

Topos measures code quality and turns it into something you can aim at: a Code Quality Medal per file. Concrete tiers with defined structural criteria—not a vague directive to “make it better.”

Each file is scored on three independent pillars. You can pass any combination; Gold means all three pass.

GOLD: Passes all 3 (Simple + Composable + Secure)
SILVER: Passes 2 of 3 (e.g., Simple & Secure)
BRONZE: Passes 1 of 3 (e.g., Composable only)
SLOP: Passes 0 (or fails to parse)

Set preferences to tell agents which medal (or pillar mix) matters most under time and token budgets. If Gold isn't reachable, they take the next-best medal on your priority list—always a defined next step, never a dead end.

Why Structure?

Agents choke on “make it better.” They thrive on structured feedback and achievable next steps. A map of quality states—not a single pass/fail bit—is what makes the difference. That map is the quality lattice behind the medals.

Most analysis tools scan syntax—style violations, known anti-patterns, missing semicolons. Topos analyzes the shape of the program itself: its structure, independent of what language it is written in. That distinction is what makes structural medals something agents can genuinely optimize toward rather than superficially satisfy.

Time and tokens are finite. Ideal code is not always achievable in one pass—and it shouldn't have to be. Topos lets you set pillar priorities so agents pursue the best program structure you can afford without burning cycles on dimensions you don't care about yet. Those preferences induce an order on the lattice: concrete instructions for exactly how to relax, medal by medal, instead of vague “polish everything” prompts.

The relaxation walk: Stuck on Gold? Your preference ranking plus the lattice always defines the next medal to aim for—the highest feasible step along the path you chose.
Anti-gaming: Because we measure graph structure, padding comments and shuffling lines don't fool the scorer—only genuine structural improvements move the needle.

The framework draws on category theory—a branch of mathematics built to specify what we mean in messy, multi-dimensional domains. For a gentler introduction, see Getting Started with Category Theory; for the formal construction, the deep dive is below.

Why a Topos? (The Structural Motivation)

Simple, Composable, and Secure are potentially orthogonal dimensions for evaluating code quality. A topos gives us the right mathematical setting for balancing complex evaluations: the medal lattice is its subobject classifier, a Heyting algebra generated by distinct dimensions of code quality. Agents write code based on your preferences by navigating the lattice.

This maps to three natural layers in the source:

Objects (representations): Programs parsed into graphs—AST, CFG, CPG, MDG. These live in topos/representations/. Each graph family is the right “view” for one pillar.
Morphisms (probes and profunctors): Structure-preserving maps on those graphs. topos/probes/ contains point-in-time metrics on a single graph (cyclomatic complexity, Martin instability, taint reachability). topos/profunctors/ contains relational comparisons between two programs—detecting cosmetic refactors, powering structural test coverage.
Truth values (the medal lattice): The 8-element Heyting algebra over {Simple, Composable, Secure}. Each element is a legitimate structural state, not a degraded score. Medals are tiers of this lattice. The preference system defines a monotone path through it—so agents always have a concrete next step, even when GOLD isn't reachable.

The name isn't decoration, by the way. The topos structure is what lets you add a fourth pillar later without redesigning the medal system—you extend the algebra, and the lattice, the traversal logic, and the agent instructions all follow.

What We Measure

Topos parses code into graph representations of program structure—abstract syntax trees, control-flow graphs, and dependency graphs—and runs structural probes on those graphs. For v0.3.0 we launch with three independent pillars; the framework extends to new ones over time.

Each pillar maps to a specific graph family:

SIMPLE

AST + CFG

Code complexity. Cyclomatic complexity and token entropy on the abstract syntax tree and control-flow graph.

COMPOSABLE

MDG

Module coupling. Martin instability and fan-out on the module dependency graph (via GitNexus).

SECURE

CPG

Data-flow safety. Dangerous API reachability and taint paths from the code property graph.

These are separate dimensions of quality—like stats on a character sheet, not one blended GPA.

Topos reads the graph, not the syntax. SIMPLE measures cyclomatic complexity and branching regardless of language. The same recursive algorithm in Python and JavaScript scores nearly the same:

Python — cyclomatic 15— SIMPLE 62%

def make_tree(d):
    if d > 0:
        d -= 1
        return (make_tree(d), make_tree(d))
    return (None, None)

def check_tree(node):
    (l, r) = node
    if l is None:
        return 1
    return 1 + check_tree(l) + check_tree(r)

JavaScript — cyclomatic 13— SIMPLE 68%

function bottomUpTree(depth) {
    return depth > 0
        ? new TreeNode(bottomUpTree(depth - 1), bottomUpTree(depth - 1))
        : new TreeNode(null, null);
}

function itemCheck(node) {
    if (node.left === null) return 1;
    return 1 + itemCheck(node.left) + itemCheck(node.right);
}

Same recursive structure, similar cyclomatic complexity, nearly identical SIMPLE scores.

Three pillars expand that to eight structural quality states. Medals group them into tiers agents and humans can act on:

Fig 1/Topos quality lattice/Eight structural states grouped into Gold / Silver / Bronze / Slop tiers.

The Topos Architecture & Case Study

Topos separates structural analysis into three layers—from parsing code into graphs, to scoring a single file, to comparing versions:

Representations: Programs aren't just text. Topos parses source into graph representations—AST, CFG, CPG, and MDG— the same structures behind the SIMPLE, COMPOSABLE, and SECURE pillars.
Probes: Point-in-time metrics on a single graph—cyclomatic complexity on a CFG, Martin instability on an MDG. They answer “how is this file doing right now?”
Profunctors (comparisons): Relational checks between two program versions—or between code and its tests. Where probes answer “how is this file right now?”, profunctors answer “what structurally changed, and does the test suite still match the production logic?” They power topos compare and Structural Test Coverage.

Case Study: Grading Code You Already Trust

A scorer is only believable once you watch it run on code you already have opinions about. So treat this as a case study, not a calibration. The pillar thresholds themselves were set on a much larger, multi-language corpus—we'll detail that work in a future, more technical post. Here we simply point Topos v0.3.0 at three libraries developers reach for every day— requests, numpy, and pandas—and score every parseable file to see how familiar code lands against the same bar.

1,927

Python files scored

Trusted libraries

Structural pillars

Each file earns three independent scores from 0 to 100(higher is better)—one per pillar:

Simplicity— Is complexity under control (branching, function size, structure)?
Secure— Does it avoid risky API patterns?
Composable— Are module dependencies clean, or tangled? (requires a one-time repo map via GitNexus)

Read the panels below as a sanity check: how does code we already trust land against thresholds calibrated elsewhere? The answer is reassuring—and revealing about where even well-loved libraries leave structural room to improve. Expand any panel to dig in.

numpy's COMPOSABLE score of 26 reflects its intentionally monolithic structure—acceptable for a library you fork once and pin forever, but a structural liability in code an agent regenerates weekly. High fan-out modules like numpy/core have instability scores approaching 1.0: they depend on almost everything and are depended on by almost nothing, making them cheap to write and expensive to change.

Library	Simplicity	Secure	Composable	Files scanned
requests	46	90	69	19
numpy	41	90	26	487
pandas	35	98	15	1,421

Fig 2/Average structural scores by library

Case study: Topos v0.3.0 scored across 1,927 parseable files in requests, numpy, and pandas. Pillar thresholds were calibrated separately on a larger multi-language corpus.

Try It On Your Codebase

Topos meets you where you review code—start in your editor, drop to the terminal, or wire it into an agent. Each path stands on its own.

VS Code Extension — MCP ServerGet quality medals and pillar scores inline in VS Code—inside agents and chat, as you review.Install extension View on GitHub

Coding agents — the MCP server exposes Topos quality targets to any coding agent. Setup guide

Terminal — install the CLI, then evaluate any repo.

curl -fsSL https://docs.krv.ai/topos/install.sh | sh

topos evaluate . -r

The Judgment Gap

The agentic era doesn't have a code shortage. It has a judgment gap. Agents ship faster than understanding travels—and the distance between generation and comprehension is exactly where debt accumulates.

Topos closes that gap concretely. An agent generates a file and Topos returns a verdict: GOLD, SILVER, BRONZE, or SLOP. Not a list of vague complaints—the same structural scorecard we just ran across production-grade libraries like numpy, pandas, and requests. If your agent lands a BRONZE on cyclomatic complexity (too many branches), the relaxation walk tells it exactly what to fix to reach SILVER: split the function, reduce nesting, or extract a helper. Every step is measurable, every target is real.

Concrete structural targets let agents build toward maintainability, not just a green test suite. You can't prompt away bad architecture—but you can measure it.

The code your agents write today is the system your team will diagnose at 2 a.m. Measure it now.

Topos v0.3.0 is live and open for exploration. Follow @krv_labs for updates.