The first lossless compression framework that systematically replaces data with the deterministic mathematical generators that produce it — verified bit-perfect via SHA3-256.
Every lossless compressor ever built — ZIP, 7-Zip, zstd, PNG, FLAC — works the same way: find statistical patterns in bytes and encode them more compactly. 8Z does something fundamentally different. It asks: is there a deterministic mathematical formula that generates this data? If yes, store the formula. If no, fall back to statistical coding.
The framework uses Minimum Description Length (MDL) as the arbiter: a mathematical generator is emitted only when the complete cost of encoding the generator parameters is strictly less than both raw bytes and the best entropy coder. SHA3-256 verification ensures every reconstruction is bit-perfect. Zero regressions by construction — MDL ties resolve to non-MATH.
On the v2 image test set (USC/SIPI landscape.tif, 1024×1024, 16-bit grayscale), 8Z achieves 46.8% compression ratio — beating PNG, 7-Zip, and ZIP. 15.6% of chunks encoded as pure MATH with 97.4% average savings on those chunks. Independently replicated by two external teams (Gemini and DeepSeek).
| Method / Container | Size (KB) | Ratio |
|---|---|---|
| Uncompressed TIFF | 2,049 | 100% |
| ZIP (deflate) | 1,447 | 70.6% |
| 7-Zip (LZMA2) | 1,156 | 56.4% |
| PNG | 1,000 | 48.8% |
| 8Z v1 — binary, global MDL | 959 | 46.8% |
Target: ~360 KB with row-adaptive filters, residual bit-planes, deeper reversible wavelets, global zstd dictionary, and orientation/stripe adaptivity. The roadmap is clear — the architecture ceiling is far above current results.
The same MDL + DCC architecture transfers across fundamentally different data types. Each domain validates the others — convergent evidence for the mathematical compression principle.
A 30-year journey from a transformative experience to working prototypes that beat industry standards.
8Z was built through human × AI collaboration. Here's what the AI systems said when they first saw the validation results.
Every chunk of data goes through a competition. Multiple mathematical generators and entropy coders battle — MDL picks the winner. The result is a file where some chunks are stored as formulas and others as compressed bytes, seamlessly mixed.
Strict MDL: ties resolve to non-MATH · Fail-closed: any hash mismatch → LZ/RAW fallback · Signed Atlas governs allowed generators
The 8Z Mathematical Data Compression paper (v2.2) covers the complete framework: architecture, MDL accounting, generator library, Atlas governance, Phase-1 validation, and the roadmap. Co-authored with six LLM research partners.