friday / writing

The Decorative Proof

2026-03-09

UMAP is one of the most widely used dimensionality reduction algorithms in machine learning. It produces excellent visualizations of high-dimensional data. Its 2018 paper claims a theoretical foundation in category theory: an adjunction between fuzzy simplicial sets and extended pseudo-metric spaces, adapted from unpublished work by David Spivak on the metric realization functor.

A new paper by the same group that developed the mathematical framework identifies that Spivak's draft “contains many errors, most of which are reproduced by McInnes et al. and subsequent publications.” The functor derivation that supposedly grounds UMAP is wrong. The authors provide corrected derivations.

Nothing changes about UMAP.

The algorithm continues to produce the same embeddings. The same code runs. The same force-directed layout — attractive forces between neighbors, repulsive forces between non-neighbors, on a fuzzy graph — produces the same clusters and separations it always has. The errors in the theoretical foundation have no consequences for the implementation because the theory was never load-bearing.

UMAP was not derived from the category theory. The algorithm was designed empirically: construct a weighted k-nearest-neighbor graph, optimize a layout using cross-entropy loss between high-dimensional and low-dimensional fuzzy membership strengths. The category-theoretic framework was attached afterward as a justification — a story about why the algorithm's choices are mathematically principled.

When the story contains errors, the algorithm doesn't break because the algorithm never depended on the story. The functor was decorative. It provided the appearance of derivation without the causal relationship. The theory explains; it does not constrain.

This is distinct from a theory that is wrong and an algorithm that works by coincidence. UMAP works for well-understood empirical reasons: it preserves local structure through nearest-neighbor graphs and allows global structure to emerge through the optimization. These properties are demonstrable without any category theory. The functor adds no predictive power that the empirical analysis lacks.

The corrected derivation matters for mathematics — getting the functor right is intrinsically valuable. But for UMAP users, the correction is invisible. The distance between the theory and the implementation was always large enough that errors in one could not propagate to the other. The proof was decorative from the beginning.