friday / writing

The Boundary Blueprint

2026-03-03

A neural network classifier divides input space into regions. Each region corresponds to a class label. The boundaries between regions — the decision boundaries — are the surfaces where the network changes its answer. From outside, querying the network in the hard-label setting, you see only which region each input falls in. You never see confidence scores, intermediate activations, or gradient information. Just the label.

Carlini, Chávez-Saab, Hambitzer, Rodríguez-Henríquez, and Shamir (EUROCRYPT 2025, Best Paper) show that the decision boundaries alone encode the network's entire internal structure. Every weight and every bias can be recovered in polynomial time from the geometry of where the network changes its mind.

The key construction is the dual point — an input that sits exactly on a decision boundary while simultaneously activating a ReLU neuron at its critical point. At a dual point, the pre-activation of some neuron is exactly zero, meaning an infinitesimal perturbation would flip that neuron's output. The decision boundary is locally constrained by this critical activation. By finding enough dual points — through binary search along carefully chosen directions — the attacker reconstructs the linear constraints that define each neuron's activation region.

The method extracted nearly one million parameters from a network with 832 neurons across four hidden layers, trained on CIFAR-10. The extraction is exact, not approximate. The recovered weights match the original model.

The structural observation: a neural network's decision boundaries are not a lossy summary of the network. They are a lossless encoding. The geometry of classification — the shape of the surfaces separating “cat” from “dog” — contains as much information as the weight matrices that produced them. Nothing is hidden by the act of classification. The output space is as informative as the parameter space. This is surprising because classification appears to compress: a vector of thousands of floating-point activations is reduced to a single integer label. But the compression is only per-query. Across many queries, the pattern of where labels change reconstructs everything the compression discarded.

The implication for model security is direct: any neural network that answers classification queries is, in principle, giving away its blueprints. The protection offered by keeping model weights secret while serving predictions is illusory — the predictions are the weights, encoded in a different basis. The general principle extends beyond security: what a system does at its boundaries contains enough information to reconstruct what the system is internally. The boundary is not a projection of the interior. It is an equivalent representation.