How a lossless audio codec was built from scratch in a single sprint — and why it beats the 25-year-old gold standard on nearly half of all test clips.Kako je brezizgubni avdio kodek nastal iz nič v enem samem sprintu — in zakaj prekaša 25 let star zlati standard pri skoraj polovici testnih posnetkov.
8Z-Audio is the third domain-specific compressor in the 8Z-LO (Lossless Optimized) framework, following 8Z-FASTA (genomic DNA sequences) and 8Z-TIF (satellite imagery). The project didn't start from an audio textbook — it started from a radical question: what if the same mathematical architecture that decodes DNA compresses music?
Before any audio code was written, the team ran a sanity check: convert Pink Floyd's "Shine On You Crazy Diamond" into DNA bases (A, C, G, T) and compress it with gemZ, the FASTA compressor built for genomic sequences.
gemZ's DNA models found zero hits — 100% RAW blocks. The biological pattern-matchers understood nothing about waveforms. But the architectural skeleton — MDL model battles, per-block predictor selection, residual coding — was exactly what audio needed.
The DCC was born in the TSP solver, migrated into the FASTA encoder, and landed in audio. It's an adaptive search governor: a learned probability model that narrows the encoder's candidate search space in real time, spending compute budget where the signal is hardest and coasting where it's easy.
In v1.5, DCC settled for the first confirmed time on a real audio file (Pink Floyd 192kHz, u=10→15), proving the concept transfers across domains.
The 8Z framework rests on one principle: Minimum Description Length (MDL) model selection. Every bit saved must correspond to genuine reconstruction capability — not statistical correlation, not heuristic intuition. Each frame of audio is a competition. Multiple predictors battle; MDL picks the winner. FLAC runs one predictor (LPC) on every frame, always. That's the gap.
From an empty file to a compressor that beats FLAC at maximum compression. Every version had a lesson. Every regression had a diagnosis.
The first working encoder was built in a single chat session. LPC via Levinson-Durbin autocorrelation, DELTA prediction ported from gemZ, MDL battle per frame, residual compression via int32 → LZMA, all four stereo decorrelation modes, bit-perfect verification.
Implemented proper Golomb-Rice coding with partition search, QLP quantization precision search, and three apodization windows (Hann, Tukey, none). The encoder crossed into FLAC-5 territory for the first time.
Exhaustive combinatorial search across all candidate configurations — LPC orders, windows, QLP precision — evaluated per frame with MDL arbitration. Expensive. Brutally thorough. This became the permanent compression quality ceiling to beat.
The DCC search governor was ported from the HYB4 FASTA encoder. 7-bit event encoding was too fine-grained — the DCC never found a stable pattern to exploit. Encode time dropped to 102 minutes (from 11 hours), but compression regressed slightly. Lesson: DCC event granularity matters enormously.
Switching DCC from 7-bit to 4-bit events was the breakthrough: the controller finally settled on Pink Floyd 192kHz (u=10→15). Block size optimized to 16384. Scanner v1.2, clipper v1.2, and parallel pipeline v1.2 built in the same session. 15 clips across 10 songs encoded simultaneously.
Two parallel codec families consolidated: 8Z-AC (native .8za format) with MDL-driven frame size selection, and xFLAC (aFLAC + vFLAC + avFLAC) producing valid .flac files via per-segment arena competition. vFLAC v1.5 introduced MDL block probe — empirically testing 3 block sizes per block instead of heuristic assignment. avFLAC's MDL final comparison picks the best of arena stitch, AF_whole, and VF_whole per file.
FLAC uses one prediction model for every frame of audio. 8Z-Audio uses up to 600 candidates, evaluated by MDL, with an adaptive search governor that learns which candidates work for this particular signal.
Vs. FLAC's 0–12. Higher orders capture longer-range correlations in complex harmonic content.
Hann, Tukey, and no apodization — evaluated per frame. FLAC uses no apodization at its default settings.
Exhaustive search over 8–16 bits of quantization precision per frame. FLAC uses a fixed precision per level.
The Digital Claustrum Controller learns per-signal statistics, adapting search depth to where the signal is hardest.
Every bit counts. MDL sees the true total cost including predictor overhead — no bit is "free."
Mid-Side, Left-Side, Right-Side, and Independent — selected per frame by MDL, not globally per file.
15 clips across 10 songs — selected by the 8Z Audio Scanner v1.2 to cover all difficulty categories. Encoded in parallel with 8 workers. All results verified lossless (SHA3-256). Compression ratio: lower is better (fraction of original size).
Total — avFLAC wins: 15/15 clips vs FLAC-12 · 8/15 vs best baseline · 3/15 vs OptimFROG (Radiohead)
The avFLAC arena splits tracks into acoustically distinct segments, encodes each with 43 candidates (including vFLAC v1.5 with MDL block probe), then MDL picks the smallest final output from arena stitch, aFLAC whole, or vFLAC whole.Arena avFLAC razdeli skladbo na akustično razločne odseke, vsakega kodira z 43 kandidati (vključno z vFLAC v1.5 z MDL sondo za bloke), nato MDL izbere najmanjši končni izhod med arena stitchem, aFLAC celodatotečnim ali vFLAC celodatotečnim.
Scanner data across 10 songs proves AI-generated audio (Producer.ai) has mean difficulty 0.25 vs. human recordings at 0.50 at matched sample rates. AI music is structurally simpler — more sustained tones, less transient chaos. Business implication: a specialized codec for AI audio platforms could achieve significantly better ratios than general-purpose lossless codecs.
The Digital Claustrum Controller reached a stable learned state on Pink Floyd 192kHz (30s, 563 frames). The critical parameter: 4-bit events instead of 7-bit. The finer granularity of 7-bit encoding produced too much noise for the controller to find a pattern. Coarser events → cleaner signal → stable adaptation. Two-pass architecture will eliminate the sequential bottleneck entirely.
Industrial music (Rammstein) consistently catastrophically fails in OptimFROG — 39.3% worse than FLAC on the hardest 10-second clip, 26–28% worse on the others. 8Z-Audio also loses on Rammstein, but far less badly (+3–8%). This is a specific architectural opportunity: if 8Z can handle industrial music better, it becomes a competitive differentiator in an otherwise mature field.
DCC: TSP solver → FASTA encoder → audio encoder. Scanner concept: DNA pipeline → audio difficulty pre-screener. MDL framework: 8Z image encoder → audio frame selection. The same architectural patterns — MDL arbitration, per-block model competition, adaptive search governors — produce gains across every domain they've been applied to.
On tonal content: 3 for 3. On dynamic content: 1 for 1. On the "easiest" category: 1 for 1. On buildups: 1 for 1. 8Z-Audio's multi-model approach was purpose-built for structured content — and the benchmark confirmed it. The remaining losses are concentrated in industrial/transient content where LPC's weaknesses are known and targeted for v1.6.
Every major lossless audio codec — FLAC (2001), WavPack (2002), TAK (2007), OptimFROG, Monkey's Audio — uses LPC as its primary or sole prediction model. No production codec performs per-frame multi-model MDL selection, harmonic modeling, or periodic template matching. The field has been architecturally frozen since ~2007. 8Z-Audio is the first systematic attempt to break out of that constraint.
Abyssal Arena v2.1 proved that per-segment encoding decisively wins on dynamically varied music. The track ranges from near-silence (d=0.004) to full climax (d=0.392) — acoustic variance that a whole-file encoder cannot exploit. lax+tukey won 10 of 12 segments with blocksize 8192 and LPC-32. The raw payload beats the best whole-file encoder by 1,078,771 bytes. Uniform tonal music (Ethereal Arc) shows no gain — the approach pays off precisely where real production music has diversity.Abyssal Arena v2.1 je dokazala, da per-segmentno kodiranje odločilno zmaguje pri dinamično raznoliki glasbi. Skladba se razteza od skorajšnje tišine (d=0,004) do polnega vrhunca (d=0,392) — akustična varianca, ki je celodatotečni kodirnik ne more izkoristiti. lax+tukey je zmagal na 10 od 12 segmentov z velikostjo bloka 8192 in LPC-32. Surovi zapis premaga najboljši celodatotečni kodirnik za 1.078.771 bajtov. Enakomerna tonalna glasba (Ethereal Arc) ne kaže dobička — pristop se obrestuje natanko tam, kjer ima prava produkcijska glasba raznolikost.
Development follows two parallel tracks toward OptimFROG-level compression. The xFLAC track (X1–X3) optimizes within the FLAC format. The AC Classical Max track (C1–C5) builds the native .8za format beyond FLAC's limits. Both share the MDL+DCC philosophy: let cost decide, not heuristics. Razvoj poteka po dveh vzporednih smereh proti kompresiji na ravni OptimFROG. Smer xFLAC (X1–X3) optimizira znotraj FLAC formata. Smer AC Classical Max (C1–C5) gradi lastni format .8za onkraj FLAC omejitev.
Eliminate segment header overhead that turns raw compression wins into file-size losses. Frame-level competition or zero-overhead stitching.
Add rANS to MDL arena alongside Rice + LZMA. Per-partition selection. Captures non-Laplace residual distributions that Rice wastes bits on.
Replace Levinson-Durbin LPC with OLS (actual sample matrix, no stationarity assumption) cascaded with NLMS adaptive filter. The biggest single gain.
Cross-channel OLS taps — each channel uses past samples from both channels. Captures delayed and frequency-dependent stereo correlation that M/S misses.
Onset-aware block splitting in vFLAC. 1024-sample blocks for drum attacks instead of wasting 4096-sample blocks on 50-sample transients.
vFLAC v1.5 over-splits on 24-bit content (1.47 MB behind lax_t6 on LG-DWAS). Needs bit-depth-aware overhead model in MDL cost function.
avFLAC tested on 18 files (3 full tracks + 15 clips). All max compression, lossless verified. avFLAC testiran na 18 datotekah (3 celotne skladbe + 15 posnetkov). Vse max kompresija, brezizgubno preverjeno.
Six original compositions by Bojan Dobrečevič, created via Producer.ai — used as the AI half of the 8Z-Audio benchmark corpus. Scanner data shows AI-generated audio is 1.9× more compressible than human recordings, making them ideal test subjects. Play the 10–30 second benchmark clip (WAV) or the full song (MP3).