Correctness is table stakes. Topos measures whether agent-written code is structurally worth keeping.
Jeremy Wayland and Sidney Gathrid
Read Field NoteTechnical notes, deployment lessons, and opinions on building evaluation infrastructure for high-stakes AI.
Correctness is table stakes. Topos measures whether agent-written code is structurally worth keeping.
Jeremy Wayland and Sidney Gathrid
Read Field NoteAcademic and industry benchmarks are falling behind what agentic systems can do in the field.
Jeremy Wayland
An opinion piece on the chasm between research performance and clinical deployment.
Jeremy Wayland
Nature Medicine calls for better evidence. Here is what that looks like.
Jeremy Wayland and Sidney Gathrid
MMLU multiverse analysis reveals hidden structure
Jeremy Wayland and Sidney Gathrid
From peer-reviewed research to open-source tool
Sidney Gathrid
Jeremy builds a DTI pipeline and wins the Novartis Challenge.
Jeremy Wayland
In collaboration with UCSB Environmental Studies & Bren School
Sidney Gathrid and Jeremy Wayland