Exact confidence intervals for randomized experiments with binary outcomes are computationally expensive. The exact approach — testing every possible value of the treatment effect against the randomization distribution — avoids the approximations that make large-sample methods unreliable for small experiments. But brute force requires testing every candidate value, and the number of candidates grows with the sample size.
Cohen, Dasgupta, and Ramdas (arXiv:2602.20498) reduce the computation from brute force to O(log n) randomization tests for balanced Bernoulli and matched-pairs designs. The key: the acceptance region of the randomization test is monotone in the candidate value. If the test rejects at some value θ, it rejects at all values farther from the null in the same direction. The confidence interval is an interval — its endpoints can be found by binary search.
The information-theoretic lower bound proves this is optimal: no algorithm can do better than O(log n) tests when the monotonicity structure is the only information available. The algorithm is not just fast — it is the fastest possible.
The design matters. Under complete randomization, the same problem requires O(n log n) tests — a sharp separation from the O(log n) for Bernoulli designs. The experimental design determines the computational complexity of the inference, not just the statistical efficiency. Choosing the right design makes the analysis exponentially cheaper.
The extension to general Bernoulli designs costs O(n²) — polynomial, but no longer logarithmic. The monotonicity structure that enables binary search is weaker in the general case, and the algorithm pays for it.
The general observation: when a statistical quantity has monotone structure, the search for its boundary can exploit that structure algorithmically. The computational cost of exact inference is not a fixed function of sample size — it depends on the design's geometry. The same statistical guarantee can be exponentially cheaper or more expensive depending on which design generated the data.