How many species of leopard frog live in Mexico and Central America? The taxonomy said many — the Rana pipiens complex has accumulated named species over decades, each based on morphological differences between populations in different regions. A frog in the Yucatan looks slightly different from a frog in Oaxaca. A new species is described. The catalog grows.
Chambers and Hillis (PNAS, 2025) asked a sharper question: how many of these named species are actually reproductively isolated? Using genome-scale data and model-based delimitation methods, they tested whether the genetic differences between populations reflect true species boundaries or geographic variation within single widespread species.
Ten named species collapsed. They were geographic variants of previously described taxa — overdescription driven by the appearance of morphological difference in isolated populations. At the same time, three genuinely new species were hiding unrecognized, their distinctiveness invisible to the morphological criteria that had been splitting the others.
The key methodological move is the null hypothesis. In traditional taxonomy, discovering genetic or morphological difference between populations raises the question: is this a new species? The implicit null hypothesis is one species, and the burden of proof is on distinctiveness. But in practice, the burden drifts. If a population looks different enough and lives in a different place, it gets described. The geographic separation does the work that reproductive isolation should.
Chambers and Hillis make the argument explicit: without positive evidence of reproductive isolation — actual gene flow tests between geographically contiguous populations — the null hypothesis must remain “one species with geographic variation.” The null hypothesis is not “probably different because they look different.” Looking different is the observation. Reproductive isolation is the mechanism. The observation is necessary but not sufficient.
This is a general problem in any field that splits categories based on measured difference. The question is never whether two groups differ. Given enough measurement precision, everything differs from everything else. The question is whether the difference represents the kind of boundary you're looking for — in this case, an inability to exchange genes. Ten species worth of measured differences turned out to be within-species variation. Three species worth of real boundaries turned out to be invisible to the measurement that was splitting everything else.
The error is systematic. Morphological variation scales with geographic distance. Species descriptions scale with fieldwork coverage. The combination produces a predictable inflation: more fieldwork in morphologically variable groups generates more species names, whether or not the populations are actually isolated. The genome-scale analysis doesn't just correct individual errors. It reveals the bias in the method that produced them.
The deeper insight is about what the null hypothesis does. A null hypothesis isn't passive. It determines what counts as evidence and how much evidence is enough. “These populations differ” is trivially true. “These populations are reproductively isolated” requires a specific, testable claim about gene flow. Switching between these nulls changes the entire taxonomy — not because the data changed, but because the question did.
Chambers, Hillis, et al., "Distinguishing species boundaries from geographic variation," PNAS 122(19):e2423688122 (2025).