friday / tools

crossing

Detect silent information loss at system boundaries.
PyPI v1.6.0 GitHub Python ≥ 3.10 MIT License

Every system boundary silently destroys information. JSON converts your tuples to lists. YAML coerces your strings to booleans. CSV flattens everything to text. Environment files lose all nesting. These aren't bugs in your code — they're properties of the serialization format. Crossing finds them.

Install

pip install crossing

Optional format support:

pip install crossing[yaml]   # YAML round-trip testing
pip install crossing[toml]   # TOML round-trip testing
pip install crossing[all]    # everything

Quick Start

Python API

from crossing import cross, json_crossing

# Test JSON round-trip with 1000 random samples
report = cross(json_crossing(), samples=1000, seed=42)

print(f"{report.clean_count} clean, "
      f"{report.lossy_count} lossy, "
      f"{report.error_count} errors")
print(f"Total loss events: {report.total_loss_events}")

CLI

# Test a single format
crossing test json -n 500 --seed 42

# Test all built-in formats
crossing test -n 200

# Compare how two formats compose
crossing compose json csv -n 300

# Measure how loss scales with repeated crossings
crossing scale json --max-n 5

# List all available crossings
crossing list

What It Finds

Loss TypeExample
type_changetuple → list (JSON), int → float (JSON)
missing_keyint dict key → string key (JSON), None key → "null" (JSON)
added_keyint key "42" appears as string key "42"
truncationstring truncated to N characters
value_changeNaN → null (JSON), precision loss (float)
length_changelist length changed during round-trip

Built-in Crossings

NameWhat it tests
jsonJSON round-trip (lenient, uses default=str)
json-strictJSON round-trip (crashes on non-native types)
picklePickle round-trip (lossless baseline)
yamlYAML dump/safe_load round-trip
tomlTOML round-trip
csvCSV round-trip (flat dict only)
envEnvironment file round-trip
urlURL query string round-trip
strstr() → eval() round-trip

Custom Crossings

from crossing import Crossing, cross

# Define your own boundary
my_boundary = Crossing(
    encode=lambda d: my_serialize(d),
    decode=lambda s: my_deserialize(s),
    name="my format"
)

report = cross(my_boundary, samples=500)
print(report)

Composition

Chain crossings to test data flowing through multiple boundaries:

from crossing import compose, cross, json_crossing, csv_crossing

# What happens when data goes JSON → CSV?
pipeline = compose(json_crossing(), csv_crossing())
report = cross(pipeline, samples=500)
# Reveals cumulative losses from both boundaries

Diff

Compare how two different boundaries handle the same data:

from crossing import diff, json_crossing, pickle_crossing

report = diff(
    json_crossing("lenient"),
    pickle_crossing("lossless"),
    samples=500
)
print(f"{report.divergent_count} samples differ")

Scaling Analysis

Measure how loss rate changes when data passes through N copies of a boundary:

from crossing import scaling, json_crossing

sr = scaling(json_crossing(), max_n=5, samples=200)
# Reveals: JSON is idempotent (loss happens once, then saturates)
# Non-idempotent crossings show positive scaling exponents

Triangulation

Compare 3+ formats simultaneously to distinguish inherent data limitations from format-specific losses. Inspired by the RLDC phase transition: 2-query round-trip testing has provable limitations. 3-query comparison enables triangulation.

from crossing import triangulate, json_crossing, csv_crossing, env_file_crossing

report = triangulate(
    json_crossing(), csv_crossing(), env_file_crossing(),
    samples=200, seed=42,
)
report.print()
# Shared losses: inherent to the data (all formats lose it)
# Unique losses: format-specific (only one format loses it)

CLI: crossing triangulate json csv env

Complexity Profiling

Measure how loss rate varies with input complexity. Reveals the "phase boundary" between lossless and lossy data for each format.

from crossing import profile, json_crossing

report = profile(json_crossing(), max_depth=6, samples=200, seed=42)
report.print()
# Shows loss rate at each nesting depth
# JSON: type diversity drives loss, not nesting depth
# Pickle: 0% loss at all depths
# CSV: lossy even on scalars (63%+)

CLI: crossing profile json

Full Report

Run a comprehensive analysis combining test, complexity profile, and scaling in one call:

from crossing import full_report, json_crossing

fr = full_report(json_crossing(), samples=200, seed=42)
fr.print()
# Outputs: round-trip test, complexity profile,
# scaling analysis, idempotency check, and verdict

CLI: crossing report json

API

The Friday API provides a Crossing endpoint for analyzing Python packages:

curl https://api.fridayops.xyz/crossing/package/flask | python3 -m json.tool

Returns semantic exception analysis: which exceptions a package raises, where, how they're handled, and how much information is lost at each boundary.

Real-World Scans

Crossing's semantic scanner finds exception polymorphism in real codebases — places where the same exception type carries different meanings that handlers can't distinguish.

ProjectFilesRaisesHandlersCrossingsInfo LossWorst Finding
Pydantic10551326310813.7 bitsAttributeError: 75% collapse, TypeError: 181 raises
Django899198012177947.7 bitsValueError: 483 raises, 65% collapse
Celery1612925544026.0 bitsKeyError: 19 raises, 112 handlers
pytest712702773719.8 bitsTypeError: 50 raises, most polymorphic
Requests186170246.0 bitsInvalidURL: 7 raises, 100% collapse
pylint179187.1 bitsValueError: 10 raises, medium risk
Flask248442154.0 bitsNoAppException: 100% collapse

Flask

Files scanned: 24 Exception raises: 84 Exception handlers: 42 Semantic crossings: 15 Total info loss: 4.0 bits NoAppException: 13 raise sites, 2 handlers — 100% collapse The handler erases all distinction between 13 different reasons the exception was raised. TypeError: 9 raise sites, 4 handlers — 25% collapse 3 of 4 handlers inspect the exception object — productive collapse, not silent loss.

Requests

Files scanned: 18 Exception raises: 61 Exception handlers: 70 Semantic crossings: 24 Total info loss: 6.0 bits InvalidURL: 7 raise sites, 1 handler — 100% collapse Malformed URL, missing schema, invalid host, empty URL — all become the same signal to the handler. ValueError: 8 raise sites, 10 handlers — 50% collapse Half the semantic information survives.

pytest

Files scanned: 71 Exception raises: 270 Exception handlers: 277 Semantic crossings: 37 Total info loss: 19.8 bits TypeError: 50 raise sites — most polymorphic 50 different meanings funneled through one type. The handler can't know which of 50 reasons triggered the exception. AttributeError: 12 raise sites — high polymorphism 12 distinct contexts, from missing attributes to type coercion failures. OSError: 7 raise sites, 24 handlers — high risk None of the 24 handlers have direct OSError raises in their try body — catching from called functions only.

pylint

Files scanned: 179 Exception raises: (scanned) Semantic crossings: 18 Total info loss: 7.1 bits ValueError: 10 raise sites — medium risk Configuration errors, type validation failures, and argument checking all share one exception type. AssertionError: 4 raise sites — high risk Internal invariant violations from different subsystems collapse to one signal.

django

Files scanned: 899 Exception raises: 1980 Exception handlers: 1217 Semantic crossings: 79 Polymorphic: 37 Elevated risk: 21 Total info loss: 47.7 bits Mean collapse: 42% ValueError: 483 raise sites, 91 handlers — high risk Raised in 335 different functions. Validation errors, configuration parsing, URL resolution, template rendering, and ORM operations all share one exception type. 8.4 bits entropy, 5.5 bits lost, 65% collapse. TypeError: 238 raise sites, 44 handlers — high risk 191 different semantic contexts. Paginator type checks, serializer validation, template variable resolution, and model state introspection. 72% collapse. ImportError: 7 raise sites, 57 handlers — high risk GIS library loading, migration loading, app configuration, URL resolution. 57 handlers catch from called functions only. LookupError: 12 raise sites, 37 handlers — medium risk App registry, model lookup, translation fallback, content type resolution. 56% collapse.

celery

Files scanned: 161 Exception raises: 292 Exception handlers: 554 Semantic crossings: 40 Polymorphic: 24 Elevated risk: 12 Total info loss: 26.0 bits Mean collapse: 40% KeyError: 19 raise sites, 112 handlers — high risk Configuration lookup, step finalization, UID/GID parsing, and result set operations all raise KeyError. 112 handlers catch from called functions only. Exception: 92 raise sites, 85 handlers — high risk Broad catch across beat scheduler, worker start, backend operations, and task execution. 6.0 bits entropy, 2.4 bits lost. WorkerTerminate: 100% collapse — high risk 3 shutdown reasons (hard shutdown, maybe_shutdown, consumer start failure) → 1 handler.

pydantic

Files scanned: 105 Exception raises: 513 Exception handlers: 263 Semantic crossings: 108 Polymorphic: 36 Elevated risk: 10 Total info loss: 13.7 bits Mean collapse: 20% TypeError: 181 raise sites, 34 handlers — high risk Schema generation, type validation, alias resolution, and discriminator logic all raise TypeError. 7.0 bits entropy, 2.3 bits lost, 32% collapse. AttributeError: 20 raise sites, 33 handlers — high risk V1→V2 migration wrappers, TypeAdapter init, BaseModel.__getattr__, and module-level fallbacks. 75% collapse — most semantic distinction lost. LookupError: 3 raise sites, 1 handler — 100% collapse Schema ref resolution and definition traversal → single handler in Discriminator._convert_schema. PydanticOmit: 3 raise sites, 4 handlers — 100% collapse Custom control-flow exception for JSON schema generation. All raise contexts treated identically.

Run your own: pip install crossing && crossing-semantic /path/to/codebase

The Thesis

Every time data crosses a system boundary, information can be silently lost. The loss isn't random — it's structural, determined by the format's type system and the data's actual types. A tuple becomes a list. An integer key becomes a string. A NaN becomes null. None of these trigger errors. All of them change the meaning of your data.

Crossing makes these invisible losses visible and measurable.