From a 2.5 KB retinal transcript to a 140 MB human chromosome — the complete dataset for mathematical structure discovery and genomic compression benchmarking.
Sorted by size ascending. Size classes: tiny <10 KB · small 10–200 KB · medium 200 KB–5 MB · large 5–50 MB · huge >50 MB
| Idx | Technical Filename | Size (bytes) | Cat | Biological Context |
|---|---|---|---|---|
| F01 | F01_TRX_Hs_Retina_NM_001014890.2.fasta | 2,530 | TRX | Human Retina mRNA; baseline scanner calibration |
| F02 | F02_TRX_Hs_SYN1_NM_006950.3.fasta | 3,330 | TRX | Synapsin I transcript; neuronal signal regulation |
| F03 | F03_TRX_Hs_NR4A2_NM_006186.4.fasta | 3,631 | TRX | Nuclear receptor; dopamine neuron development |
| F04 | F04_TRX_Hs_GNG2_NM_001243773.2.fasta | 3,711 | TRX | G-protein subunit; essential signaling molecule |
| F05 | F05_TRX_Hs_GNB4_NM_021629.4.fasta | 6,471 | TRX | Guanine nucleotide-binding protein; signal routing |
| F06 | F06_VIR_HIV1_AF033819.3.fasta | 9,349 | VIR | HIV-1 Genome; high mutation rate, chaotic structure |
| F07 | F07_VIR_Rabies_NC_001542.1.fasta | 12,147 | VIR | Rabies virus; negative-strand RNA architecture |
| F08 | F08_EUK_Ce_chrM_NC_001328.1.fasta | 14,086 | EUK | C. elegans mitochondrial DNA |
| F09 | F09_ORG_Mm_mito_NC_005089.1.fasta | 16,590 | ORG | Mouse mitochondrial DNA; "cellular battery" genome |
| F10 | F10_ORG_Hs_mito_NC_012920.1.fasta | 16,864 | ORG | Human mitochondrial DNA; conserved circular structure |
| F11 | F11_ORG_Zm_mito_NC_007982.1.fasta | 16,958 | ORG | Maize (Zea mays) mitochondrion; plant organelle |
| F12 | F12_VIR_SARS2_NC_045512.2.fasta | 30,429 | VIR | SARS-CoV-2; long RNA virus with high secondary structure |
| F13 | F13_ORG_At_chloro_NC_000932.1.fasta | 156,749 | ORG | Arabidopsis chloroplast DNA; photosynthetic machinery |
| F14 | F14_EUK_Hs_AC010145.fasta | 204,719 | EUK | Human MYCN region; amplified in certain cancers |
| F15 | F15_EUK_Sc_chrI_NC_001133.9.fasta | 233,584 | EUK | Yeast Chromosome I; smallest eukaryotic chromosome |
| F16 | F16_EUK_Sc_chrVIII_NC_001140.6.fasta | 572,081 | EUK | Yeast Chromosome VIII; medium-scale fungal DNA |
| F17 | F17_EUK_Sc_chrV_NC_001137.3.fasta | 586,543 | EUK | Yeast Chromosome V |
| F18 | F18_EUK_Sc_chrII_NC_001134.8.fasta | 826,794 | EUK | Yeast Chromosome II |
| F19 | F19_EUK_Sc_chrXIII_NC_001145.3.fasta | 939,899 | EUK | Yeast Chromosome XIII |
| F20 | F20_EUK_Sc_chrXVI_NC_001148.4.fasta | 963,926 | EUK | Yeast Chromosome XVI |
| F21 | F21_BAC_Syn1_CP002027.1.fasta | 1,094,313 | BAC | Synthetic Mycoplasma; first "man-made" genome |
| F22 | F22_EUK_Sc_chrXII_NC_001144.5.fasta | 1,096,206 | EUK | Yeast Chromosome XII |
| F23 | F23_EUK_Sc_chrVII_NC_001139.9.fasta | 1,109,182 | EUK | Yeast Chromosome VII (Sample A) |
| F24 | F24_EUK_Sc_chrVII_NC_001139.9.fasta | 1,109,182 | EUK | Yeast Chromosome VII (Control Duplicate) |
| F25 | F25_EUK_Sc_chrXV_NC_001147.6.fasta | 1,109,537 | EUK | Yeast Chromosome XV |
| F26 | F26_EUK_Sc_chrIV_NC_001136.10.fasta | 1,557,523 | EUK | Yeast Chromosome IV; largest of yeast test set |
| F27 | F27_ARC_Mj_NC_000909.1.fasta | 1,688,828 | ARC | M. jannaschii; Archaea extremophile |
| F28 | F28_BAC_Drad_chr1_NC_001263.1.fasta | 2,686,574 | BAC | D. radiodurans; radiation-resistant genome |
| F29 | F29_BAC_Ecoli_U00096.3.fasta | 4,708,032 | BAC | E. coli K12; "Rosetta Stone" of bacterial genomics |
| F30 | F30_EUK_Ce_chrI_NC_003279.8.fasta | 15,323,699 | EUK | Roundworm Chromosome I; transition to multicellularity |
| F31 | F31_EUK_Ce_chrX_NC_003284.9.fasta | 18,014,315 | EUK | Roundworm Chromosome X |
| F32 | F32_EUK_Ce_chrV_NC_003283.11.fasta | 21,272,974 | EUK | Roundworm Chromosome V |
| F33 | F33_EUK_Os_chr10_NC_008391.2.fasta | 23,594,136 | EUK | Rice (Oryza sativa) Chromosome 10; plant genome |
| F34 | F34_EUK_At_chr5_NC_003076.8.fasta | 27,425,149 | EUK | Arabidopsis Chromosome 5 |
| F35 | F35_EUK_At_chr1_NC_003070.9.fasta | 30,934,854 | EUK | Arabidopsis Chromosome 1; model plant architecture |
| F36 | F36_EUK_Hs_chr21_NC_000021.9.fasta | 47,488,540 | EUK | Human Chromosome 21; smallest human autosome |
| F37 | F37_EUK_Hs_chr22_NC_000022.11.fasta | 51,665,500 | EUK | Human Chromosome 22 |
| F38 | F38_EUK_Hs_chrY_CM000686.2_sample.fasta | 58,181,267 | EUK | Human Y Chromosome; unique repeat structures |
| F39 | F39_EUK_Hs_chr19_NC_000019.10.fasta | 59,594,634 | EUK | Human Chromosome 19; gene-dense |
| F40 | F40_EUK_Hs_chr20_NC_000020.11.fasta | 65,518,294 | EUK | Human Chromosome 20 |
| F41 | F41_EUK_Hs_chr18_NC_000018.10.fasta | 81,712,897 | EUK | Human Chromosome 18 |
| F42 | F42_EUK_Hs_chr17_NC_000017.11.fasta | 84,645,123 | EUK | Human Chromosome 17 |
| F43 | F43_EUK_Hs_chr16_NC_000016.10.fasta | 91,844,042 | EUK | Human Chromosome 16 |
| F44 | F44_EUK_Hs_chr15_NC_000015.10.fasta | 103,691,101 | EUK | Human Chromosome 15 |
| F45 | F45_EUK_Hs_chr14_NC_000014.9.fasta | 108,827,838 | EUK | Human Chromosome 14 |
| F46 | F46_EUK_Hs_chr13_NC_000013.11.fasta | 116,270,459 | EUK | Human Chromosome 13 |
| F47 | F47_EUK_Hs_chr12_NC_000012.12.fasta | 135,496,623 | EUK | Human Chromosome 12 |
| F48 | F48_EUK_Hs_chr10_NC_000010.11.fasta | 136,027,438 | EUK | Human Chromosome 10 |
| F49 | F49_EUK_Hs_chr11_NC_000011.10.fasta | 137,338,124 | EUK | Human Chromosome 11 |
| F50 | F50_EUK_Hs_chr09_NC_000009.12.fasta | 140,701,352 | EUK | Human Chromosome 9; maximum complexity stress test |
Both are Yeast Chromosome VII (NC_001139.9) — identical 1,109,182-byte files. This deliberate duplicate ensures every encoder produces byte-for-byte identical output on identical input, validating deterministic behavior across all benchmark runs.