Pulsar: Robust Topology at Scale
From peer-reviewed research in Nature Energy to an open-source tool for real-world analysis. Pulsar brings the THEMA algorithm to researchers, data scientists, and ML practitioners in a form that is fast enough for large, complex datasets and accessible without specialized topological tooling.
Origin Story
In early 2025, we published research in Nature Energy introducing the Thema algorithm — a new approach to extracting robust structure from complex, high-dimensional datasets. The paper applied Thema to accelerating US coal plant retirements, but the underlying method had far broader potential.
Pulsar is the productized form of that research: a high-performance Rust implementation that makes robust topological data analysis accessible to anyone — not just researchers with deep programming expertise.
The Problem with Standard Approaches
When analyzing complex datasets, researchers typically rely on dimensionality reduction techniques like t-SNE, UMAP, or PCA. These tools produce visualizations and embeddings that can reveal structure — but they come with a fundamental limitation.
These embeddings are fragile. Adjust a hyperparameter slightly, and carefully identified clusters may dissolve entirely. This sensitivity raises an uncomfortable question: are the patterns you see genuine structure, or artifacts of a fortunate parameter choice?
Traditional clustering often reduces to "picking the pretty embedding" — selecting whichever visualization appears most interpretable, even when slight parameter changes would tell a completely different story.
Thema: Embracing Variation
Thema takes a fundamentally different approach. Rather than fighting parameter sensitivity, it exploits it as signal. The algorithm:
- Constructs many graphs across a systematic sweep of parameters (PCA dimensions, epsilon values, random seeds)
- Accumulates evidence from all configurations into a single cosmic graph
- Retains only the structure that persists consistently across the entire multiverse of parameter settings
The result is a robust, reproducible representation — one that captures genuine data topology rather than parameter artifacts.
How THEMA Works
THEMA does not bet everything on one chart or one parameter setting. Instead, it looks at the same dataset many different ways, builds a whole multiverse of graphs, and asks which relationships keep showing up no matter how you tune the model. The cosmic graph then folds all of that evidence into one weighted map, so the strongest connections are the ones that stay stable across many possible views of the data.
What Pulsar Delivers
Pulsar is not a different algorithm from THEMA. It is the same core approach rebuilt for speed and scale, so researchers can run large multiverse sweeps and cosmic-graph analysis on datasets that would otherwise be too slow or too cumbersome to explore in practice.
In practical terms, that means less time guessing at hyperparameters and more time identifying stable clusters, meaningful relationships, and structures that are worth acting on.
A Rust core with parallel processing makes THEMA practical at much larger scales. A full 4,000-map sweep completes in approximately 50 seconds.
Run THEMA-powered analysis through Claude using natural language, without needing to manage the workflow by hand.
Analyze complex datasets like MMLU that were previously impractical for running THEMA end to end.
AI-Powered Analysis via MCP
The Model Context Protocol (MCP) integration fundamentally changes how you interact with topological analysis. Run Pulsar directly through Claude using natural language:
"Use Pulsar to analyze the penguin dataset at demos/penguins/penguins.csv. What clusters emerge and how do they relate to species?"
Claude handles parameter tuning, executes the topological sweep, and generates a statistical dossier — all without writing a single line of code.
Who This Is For
- Researchers without programming expertise: Run topological analysis through natural language via MCP
- Data scientists: Obtain reproducible, robust clustering that doesn't depend on hyperparameter luck
- ML practitioners: Reveal hidden structure in embeddings and benchmark datasets
- Domain experts: Focus on interpretation rather than parameter tuning
See It in Action
If you want to understand the method in practice, start with our technical deep-dive on the MMLU benchmark. If you want to run Pulsar yourself, the docs and GitHub links above will get you started quickly.
Evaluating LLM Benchmarks with Pulsar