Pulsar: Robust Topology at Scale

Pulsar brings the THEMA algorithm to researchers, data scientists, and ML practitioners in a form that is fast enough for large, complex datasets and accessible without specialized topological tooling.

Get started with the Pulsar docs or browse the source on GitHub.

Origin Story

In early 2025, we published research in Nature Energy introducing the Thema algorithm — a new approach to extracting robust structure from complex, high-dimensional datasets. The paper applied Thema to accelerating US coal plant retirements, but the underlying method had far broader potential.

Pulsar is the productized form of that research: a high-performance Rust implementation that makes robust topological data analysis accessible to anyone — not just researchers with deep programming expertise.

The Problem with Standard Approaches

When analyzing complex datasets, researchers typically rely on dimensionality reduction techniques like t-SNE, UMAP, or PCA. These tools produce visualizations and embeddings that can reveal structure — but they come with a fundamental limitation.

These embeddings are fragile. Adjust a hyperparameter slightly, and carefully identified clusters may dissolve entirely. This sensitivity raises an uncomfortable question: are the patterns you see genuine structure, or artifacts of a fortunate parameter choice?

Thema: Embracing Variation

Thema takes a fundamentally different approach. Rather than fighting parameter sensitivity, it exploits it as signal. The algorithm:

Constructs many graphs across a systematic sweep of parameters (PCA dimensions, epsilon values, random seeds)
Accumulates evidence from all configurations into a single cosmic graph
Retains only the structure that persists consistently across the entire multiverse of parameter settings

The result is a robust, reproducible representation — one that captures genuine data topology rather than parameter artifacts.

How THEMA Works

THEMA does not bet everything on one chart or one parameter setting. Instead, it looks at the same dataset many different ways, builds a whole multiverse of graphs, and asks which relationships keep showing up no matter how you tune the model. The cosmic graph then folds all of that evidence into one weighted map, so the strongest connections are the ones that stay stable across many possible views of the data.

Fig 1/The THEMA pipeline/A multiverse of parameter sweeps collapses into a single cosmic graph, retaining only structure that persists across configurations.

What Pulsar Delivers

Pulsar is not a different algorithm from THEMA. It is the same core approach rebuilt for speed and scale, so researchers can run large multiverse sweeps and cosmic-graph analysis on datasets that would otherwise be too slow or too cumbersome to explore in practice.

In practical terms, that means less time guessing at hyperparameters and more time identifying stable clusters, meaningful relationships, and structures that are worth acting on.

100x

Faster Than Python

A Rust core with parallel processing makes THEMA practical at much larger scales. A full 4,000-map sweep completes in approximately 50 seconds.

MCP

AI-Assisted Analysis

Run THEMA-powered analysis through Claude using natural language, without needing to manage the workflow by hand.

14k+

Points at Scale

Analyze complex datasets like MMLU that were previously impractical for running THEMA end to end.

AI-Powered Analysis via MCP

The Model Context Protocol (MCP) integration fundamentally changes how you interact with topological analysis. Run Pulsar directly through Claude using natural language:

Example prompt

“Use Pulsar to analyze the penguin dataset at demos/penguins/penguins.csv. What clusters emerge and how do they relate to species?”

Claude handles parameter tuning, executes the topological sweep, and generates a statistical dossier — all without writing a single line of code.

Who This Is For

Researchers without programming expertise: Run topological analysis through natural language via MCP.
Data scientists:Obtain reproducible, robust clustering that doesn't depend on hyperparameter luck.
ML practitioners: Reveal hidden structure in embeddings and benchmark datasets.
Domain experts: Focus on interpretation rather than parameter tuning.

See It in Action

If you want to understand the method in practice, start with our technical deep-dive on the MMLU benchmark. If you want to run Pulsar yourself, the docs and GitHub links above will get you started quickly.