Round 2 tested 9 encoder variants built by 8 different LLMs, all starting from the same gemZ v9.3 baseline. Seven of nine converged within 1 KB across 1.58 GB of data.
Round 2 tested 9 encoder variants built by different LLMs, all starting from the same gemZ v9.3 baseline architecture, benchmarked on 50 FASTA files spanning tiny transcripts (2.5 KB) through full human chromosomes (140 MB).
Key finding: Seven of nine encoders converged to near-identical totals (~305.4 MB), differing by less than 1 KB on most files. The architecture is at its ceiling. The two outliers (GLMz at 311.5 MB, GPTz at 364.1 MB) regressed.
The GeCo3 reference achieves ~266.4 MB (16.88%) — roughly 39 MB ahead of all 8Z variants. This is the real competitive gap to close.
| # | Encoder | LLM | Lines | Total Compressed | Ratio | vs 7-Zip | Wall Time | OK |
|---|---|---|---|---|---|---|---|---|
| — | GeCo3 | Reference (C) | — | ~266,386,513 | 16.88% | −22.2% | — | 50 |
| 1 | DSEz | DeepSeek-V3 R1 | 727 | 305,423,381 | 19.35% | −10.8% | 6h 47m | 50 |
| 1 | GEMz_orig | Original v9.3 | 644 | 305,423,381 | 19.35% | −10.8% | 5h 06m | 50 |
| 3 | GEMz | Gemini 3 Pro | 715 | 305,423,431 | 19.35% | −10.8% | 6h 48m | 50 |
| 3 | MMAz | MiniMax M2.5 Think | 711 | 305,423,431 | 19.35% | −10.8% | 6h 45m | 50 |
| 5 | KIMz | Kimi K2.5 Think | 743 | 305,423,488 | 19.35% | −10.8% | 6h 60m | 50 |
| 6 | QWEz | Qwen 3.5 Plus | 718 | 305,424,181 | 19.35% | −10.8% | 6h 34m | 50 |
| 7 | GROz | Grok 4.2 (beta) | 649 | 305,911,469 | 19.38% | −10.7% | 6h 19m | 50 |
| 8 | GLMz | GLM 5 DeepThink | 789 | 311,527,258 | 19.74% | −9.0% | 5h 30m | 50 |
| 9 | GPTz | ChatGPT 5.2 Think | 787 | 364,133,180 | 23.07% | +6.3% | 8h 08m | 50 |
DeepSeek-V3 R1 produced output identical to the Original v9.3 baseline — 0 files differ. It reproduced the reference encoder perfectly.
| Compressor | Type | Total | Ratio |
|---|---|---|---|
| GeCo3 | Context-mixing AC (C) | ~266,386,513 | 16.88% |
| 8z-GPT (R1) | HM4 hybrid (Python) | 302,134,518 | 19.14% |
| 8z-CLA (R1) | HM4 hybrid (Python) | 302,149,410 | 19.14% |
| 8Z DSEz (R2 best) | MDL + transforms (Python) | 305,423,381 | 19.35% |
| HYB4 (Streaming+CTX8) | MDL + CTX8+DCC (Python) | 318,738,170 | 20.19% |
| 7-Zip | LZMA2 (generic) | 342,442,953 | 21.69% |
| ZIP | deflate (generic) | 380,918,790 | 24.13% |
Bits per base (bpb) on representative genomes, compared against GeCo3, JARVIS3, and MFCompress — all C-compiled.
| File | Genome | ACGT bp | #1 | bpb | #2 | bpb | gemZ bpb | Rank |
|---|---|---|---|---|---|---|---|---|
| F06 | HIV-1 | 9,181 | GeCo3 | 2.002 | JARVIS3 | 2.022 | 2.748 | 5/6 |
| F10 | Hs mito | 16,568 | GeCo3 | 1.964 | JARVIS3 | 2.022 | 2.402 | 4/6 |
| F12 | SARS-CoV-2 | 29,903 | GeCo3 | 1.960 | JARVIS3 | 2.012 | 2.234 | 4/6 |
| F14 | Hs MYCN | 198,960 | GeCo3 | 1.764 | JARVIS3 | 1.856 | 1.936 | 3/6 ★ |
| F21 | Synthia | 1,078,809 | GeCo3 | 1.683 | MFComp | 1.722 | 1.782 | 3/6 ★ |
| F29 | E. coli | 4,641,652 | GeCo3 | 1.887 | MFComp | 1.919 | 1.981 | 3/6 ★ |
gemZ ranks 3rd on F14, F21, F29 — a Python single-threaded encoder outperforming JARVIS3 (2024 SOTA, C-optimized) on the three largest genomes. On F21 the margin is −7.7% vs JARVIS3.
Genomes grouped by mathematical signal strength. The gap narrows from 25% down to 5% as structure gets stronger.
| Tier | Genomes | gemZ bpb | GeCo3 bpb | Gap |
|---|---|---|---|---|
| Tier 1 Compositional | F06, F07, F10, F12 | 2.478 | 1.981 | +25.1% |
| Tier 2 Context-sensitive | F13, F14, F21 | 1.827 | 1.684 | +8.5% |
| Tier 3 Robust structure | F15, F29 | 1.981 | 1.884 | +5.2% |
The GeCo3 gap shrinks from 25% → 8.5% → 5.2% as mathematical structure gets stronger. gemZ's transform-based approach is most effective where DNA contains exploitable structure. GeCo3's advantage lies in its statistical model's handling of weak-signal regions.
| Size Class | Files | Best Encoder | Wins | Notes |
|---|---|---|---|---|
| Small (<100 KB) | 11 | GPTz | 7 | Lower container overhead on tiny files |
| DSEz | 2 | Wins where NIB kicks in (14–17 KB) | ||
| Medium (100 KB–10 MB) | 18 | DSEz | 14 | NIB+brotli dominates yeast/bacterial |
| GROz | 4 | Wins with solid+nib or byte variant | ||
| Large (>10 MB) | 21 | DSEz | 18 | split|brotli+nib with RAW/PER modes |
| GROz | 2 | chr19 benefits from byte-mode brotli |
| File | GROz Strategy | GROz | DSEz | Savings |
|---|---|---|---|---|
| chr19 (gene-dense) | split | brotli + byte | 11,313,438 | 11,386,032 | −72,594 |
| Ce chrV | solid | brotli + nib | 4,777,626 | 4,820,116 | −42,490 |
| Zm mito | variant | 134,065 | 138,053 | −3,988 |
HYB4 (Claude Opus 4.6) introduced streaming mode, order-8 context models, and dynamic codec competition. It achieved 20.19% ratio but regressed 4.4% vs GEMz.
HYB4 regressed on large human chromosomes (F37–F50) producing 5–9% larger outputs than GEMz. Streaming mode and CTX8 added overhead not recovered by improved predictions. The simpler GEMz with tight MDL-governed mode selection proved more efficient at scale.
On small-to-medium files (F01–F36), HYB4 matched or beat GEMz on most files. The regression is concentrated in the largest human chromosomes. CTX8 could work in a non-streaming architecture.
| Scenario | Total | vs DSEz |
|---|---|---|
| DSEz (best single encoder) | 305,423,381 | — |
| Oracle (best-of-all per file) | 305,301,168 | −122 KB (−0.04%) |
| GeCo3 target | ~266,386,513 | −39 MB (−12.8%) |
A perfect per-file selector across all 9 encoders saves only 122 KB (0.04%). The real gap is 39 MB (12.8%) to GeCo3.
| # | Encoder | s/MB | Relative | Notes |
|---|---|---|---|---|
| 1 | GEMz_orig | 137 | 1.0× | Fastest + best compression |
| 2 | GROz | 247 | 1.8× | |
| 3 | QWEz | 366 | 2.7× | |
| 4 | DSEz | 443 | 3.2× | |
| 5 | GEMz | 513 | 3.7× | |
| 6 | KIMz | 540 | 3.9× | |
| 7 | GLMz | 718 | 5.2× | |
| 8 | GPTz | 1,604 | 11.7× | Slowest by far |
| Phase | Technique | Expected Gain | Complexity |
|---|---|---|---|
| 1 | Merge Round 1 HYB (CTX3 + multi-block-size + solid-vs-split) | ~3.3 MB | Medium |
| 2 | Adaptive container (minimal header for small files) | ~2–3 KB | Low |
| 3 | Higher-order context models (order-5 to order-8) | 5–10% | High |
| 4 | ANS/rANS arithmetic backend replacing LZMA/brotli | 10–15% | Very High |
| 5 | Context mixing (PPM* or CM-style) for base prediction | 15–20% | Very High |