Every system boundary silently destroys information. JSON converts your tuples to lists. YAML coerces your strings to booleans. CSV flattens everything to text. Environment files lose all nesting. These aren't bugs in your code — they're properties of the serialization format. Crossing finds them.
pip install crossing
Optional format support:
pip install crossing[yaml] # YAML round-trip testing
pip install crossing[toml] # TOML round-trip testing
pip install crossing[all] # everything
from crossing import cross, json_crossing
# Test JSON round-trip with 1000 random samples
report = cross(json_crossing(), samples=1000, seed=42)
print(f"{report.clean_count} clean, "
f"{report.lossy_count} lossy, "
f"{report.error_count} errors")
print(f"Total loss events: {report.total_loss_events}")
# Test a single format
crossing test json -n 500 --seed 42
# Test all built-in formats
crossing test -n 200
# Compare how two formats compose
crossing compose json csv -n 300
# Measure how loss scales with repeated crossings
crossing scale json --max-n 5
# List all available crossings
crossing list
| Loss Type | Example |
|---|---|
type_change | tuple → list (JSON), int → float (JSON) |
missing_key | int dict key → string key (JSON), None key → "null" (JSON) |
added_key | int key "42" appears as string key "42" |
truncation | string truncated to N characters |
value_change | NaN → null (JSON), precision loss (float) |
length_change | list length changed during round-trip |
| Name | What it tests |
|---|---|
json | JSON round-trip (lenient, uses default=str) |
json-strict | JSON round-trip (crashes on non-native types) |
pickle | Pickle round-trip (lossless baseline) |
yaml | YAML dump/safe_load round-trip |
toml | TOML round-trip |
csv | CSV round-trip (flat dict only) |
env | Environment file round-trip |
url | URL query string round-trip |
str | str() → eval() round-trip |
from crossing import Crossing, cross
# Define your own boundary
my_boundary = Crossing(
encode=lambda d: my_serialize(d),
decode=lambda s: my_deserialize(s),
name="my format"
)
report = cross(my_boundary, samples=500)
print(report)
Chain crossings to test data flowing through multiple boundaries:
from crossing import compose, cross, json_crossing, csv_crossing
# What happens when data goes JSON → CSV?
pipeline = compose(json_crossing(), csv_crossing())
report = cross(pipeline, samples=500)
# Reveals cumulative losses from both boundaries
Compare how two different boundaries handle the same data:
from crossing import diff, json_crossing, pickle_crossing
report = diff(
json_crossing("lenient"),
pickle_crossing("lossless"),
samples=500
)
print(f"{report.divergent_count} samples differ")
Measure how loss rate changes when data passes through N copies of a boundary:
from crossing import scaling, json_crossing
sr = scaling(json_crossing(), max_n=5, samples=200)
# Reveals: JSON is idempotent (loss happens once, then saturates)
# Non-idempotent crossings show positive scaling exponents
Compare 3+ formats simultaneously to distinguish inherent data limitations from format-specific losses. Inspired by the RLDC phase transition: 2-query round-trip testing has provable limitations. 3-query comparison enables triangulation.
from crossing import triangulate, json_crossing, csv_crossing, env_file_crossing
report = triangulate(
json_crossing(), csv_crossing(), env_file_crossing(),
samples=200, seed=42,
)
report.print()
# Shared losses: inherent to the data (all formats lose it)
# Unique losses: format-specific (only one format loses it)
CLI: crossing triangulate json csv env
Measure how loss rate varies with input complexity. Reveals the "phase boundary" between lossless and lossy data for each format.
from crossing import profile, json_crossing
report = profile(json_crossing(), max_depth=6, samples=200, seed=42)
report.print()
# Shows loss rate at each nesting depth
# JSON: type diversity drives loss, not nesting depth
# Pickle: 0% loss at all depths
# CSV: lossy even on scalars (63%+)
CLI: crossing profile json
Run a comprehensive analysis combining test, complexity profile, and scaling in one call:
from crossing import full_report, json_crossing
fr = full_report(json_crossing(), samples=200, seed=42)
fr.print()
# Outputs: round-trip test, complexity profile,
# scaling analysis, idempotency check, and verdict
CLI: crossing report json
The Friday API provides a Crossing endpoint for analyzing Python packages:
curl https://api.fridayops.xyz/crossing/package/flask | python3 -m json.tool
Returns semantic exception analysis: which exceptions a package raises, where, how they're handled, and how much information is lost at each boundary.
Crossing's semantic scanner finds exception polymorphism in real codebases — places where the same exception type carries different meanings that handlers can't distinguish.
| Project | Files | Raises | Handlers | Crossings | Info Loss | Worst Finding |
|---|---|---|---|---|---|---|
| Pydantic | 105 | 513 | 263 | 108 | 13.7 bits | AttributeError: 75% collapse, TypeError: 181 raises |
| Django | 899 | 1980 | 1217 | 79 | 47.7 bits | ValueError: 483 raises, 65% collapse |
| Celery | 161 | 292 | 554 | 40 | 26.0 bits | KeyError: 19 raises, 112 handlers |
| pytest | 71 | 270 | 277 | 37 | 19.8 bits | TypeError: 50 raises, most polymorphic |
| Requests | 18 | 61 | 70 | 24 | 6.0 bits | InvalidURL: 7 raises, 100% collapse |
| pylint | 179 | — | — | 18 | 7.1 bits | ValueError: 10 raises, medium risk |
| Flask | 24 | 84 | 42 | 15 | 4.0 bits | NoAppException: 100% collapse |
Run your own: pip install crossing && crossing-semantic /path/to/codebase
Every time data crosses a system boundary, information can be silently lost. The loss isn't random — it's structural, determined by the format's type system and the data's actual types. A tuple becomes a list. An integer key becomes a string. A NaN becomes null. None of these trigger errors. All of them change the meaning of your data.
Crossing makes these invisible losses visible and measurable.