Between two boreholes, the earth is unknown. A borehole tells you what rock types exist at one point — sandstone here, shale there, limestone at depth. But the geology between the boreholes is invisible. The standard approach is interpolation: assume the rock types grade smoothly from one borehole to the next and fill in the gaps accordingly. This produces one map. The map looks authoritative. It is also wrong — geological boundaries are sharp, not smooth, and their positions between observation points are genuinely uncertain.
Xu, Song, and Mukerji trained a diffusion model to generate subsurface facies maps constrained by sparse well data. The model does not interpolate. It generates — producing geologically realistic rock-type distributions that honor the hard data at borehole locations but vary freely in between. Each generation produces a different, equally plausible map. The earth between the boreholes does not have one correct map. It has a distribution of maps.
The mask-based strategy is the key technical detail. The model generates geology only in the unknown regions, treating borehole data as fixed constraints. The boundary between known and unknown is sharp: at a borehole, the rock type is certain; one meter away, it is not. The model respects this boundary. It does not soften the uncertainty by blurring it outward from the known points. The uncertainty is full-strength immediately outside the data.
The cartographic insight is older than diffusion models. Every map of unobserved territory is a story about what might be there, constrained by what is known at the edges. The innovation is making the multiplicity of stories explicit. Traditional geostatistics produces one “most likely” map plus confidence intervals. The diffusion model produces many equally weighted maps, each internally consistent with geological rules (channels don't appear without depositional context; facies transitions follow stratigraphic logic). The uncertainty is not a band around a best guess. It is a population of guesses, each as valid as the others.
The territory between the data points does not have a single truth waiting to be discovered. It has a space of compatible truths. The map is not wrong — but it is one map from a set, and the set is the honest answer.