ATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCG
GCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTA
AIM³ Institute · 8Z-LO Framework · FASTA Collection

50 Genomes,
One Corpus

From a 2.5 KB retinal transcript to a 140 MB human chromosome — the complete dataset for mathematical structure discovery and genomic compression benchmarking.

50
FASTA Files
1.58 GB
Total Size
5
Categories
3
Domains of Life
Taxonomic Coverage

All Three Domains of Life + Viruses

27
EUK · Eukaryota
3
VIR · Viruses
2
BAC · Bacteria
1
ARC · Archaea
4
ORG · Organelles
5
TRX · Transcripts
8
Species
Complete Manifest

All 50 FASTA Files

Sorted by size ascending. Size classes: tiny <10 KB · small 10–200 KB · medium 200 KB–5 MB · large 5–50 MB · huge >50 MB

IdxTechnical FilenameSize (bytes)CatBiological Context
F01F01_TRX_Hs_Retina_NM_001014890.2.fasta2,530TRXHuman Retina mRNA; baseline scanner calibration
F02F02_TRX_Hs_SYN1_NM_006950.3.fasta3,330TRXSynapsin I transcript; neuronal signal regulation
F03F03_TRX_Hs_NR4A2_NM_006186.4.fasta3,631TRXNuclear receptor; dopamine neuron development
F04F04_TRX_Hs_GNG2_NM_001243773.2.fasta3,711TRXG-protein subunit; essential signaling molecule
F05F05_TRX_Hs_GNB4_NM_021629.4.fasta6,471TRXGuanine nucleotide-binding protein; signal routing
F06F06_VIR_HIV1_AF033819.3.fasta9,349VIRHIV-1 Genome; high mutation rate, chaotic structure
F07F07_VIR_Rabies_NC_001542.1.fasta12,147VIRRabies virus; negative-strand RNA architecture
F08F08_EUK_Ce_chrM_NC_001328.1.fasta14,086EUKC. elegans mitochondrial DNA
F09F09_ORG_Mm_mito_NC_005089.1.fasta16,590ORGMouse mitochondrial DNA; "cellular battery" genome
F10F10_ORG_Hs_mito_NC_012920.1.fasta16,864ORGHuman mitochondrial DNA; conserved circular structure
F11F11_ORG_Zm_mito_NC_007982.1.fasta16,958ORGMaize (Zea mays) mitochondrion; plant organelle
F12F12_VIR_SARS2_NC_045512.2.fasta30,429VIRSARS-CoV-2; long RNA virus with high secondary structure
F13F13_ORG_At_chloro_NC_000932.1.fasta156,749ORGArabidopsis chloroplast DNA; photosynthetic machinery
F14F14_EUK_Hs_AC010145.fasta204,719EUKHuman MYCN region; amplified in certain cancers
F15F15_EUK_Sc_chrI_NC_001133.9.fasta233,584EUKYeast Chromosome I; smallest eukaryotic chromosome
F16F16_EUK_Sc_chrVIII_NC_001140.6.fasta572,081EUKYeast Chromosome VIII; medium-scale fungal DNA
F17F17_EUK_Sc_chrV_NC_001137.3.fasta586,543EUKYeast Chromosome V
F18F18_EUK_Sc_chrII_NC_001134.8.fasta826,794EUKYeast Chromosome II
F19F19_EUK_Sc_chrXIII_NC_001145.3.fasta939,899EUKYeast Chromosome XIII
F20F20_EUK_Sc_chrXVI_NC_001148.4.fasta963,926EUKYeast Chromosome XVI
F21F21_BAC_Syn1_CP002027.1.fasta1,094,313BACSynthetic Mycoplasma; first "man-made" genome
F22F22_EUK_Sc_chrXII_NC_001144.5.fasta1,096,206EUKYeast Chromosome XII
F23F23_EUK_Sc_chrVII_NC_001139.9.fasta1,109,182EUKYeast Chromosome VII (Sample A)
F24F24_EUK_Sc_chrVII_NC_001139.9.fasta1,109,182EUKYeast Chromosome VII (Control Duplicate)
F25F25_EUK_Sc_chrXV_NC_001147.6.fasta1,109,537EUKYeast Chromosome XV
F26F26_EUK_Sc_chrIV_NC_001136.10.fasta1,557,523EUKYeast Chromosome IV; largest of yeast test set
F27F27_ARC_Mj_NC_000909.1.fasta1,688,828ARCM. jannaschii; Archaea extremophile
F28F28_BAC_Drad_chr1_NC_001263.1.fasta2,686,574BACD. radiodurans; radiation-resistant genome
F29F29_BAC_Ecoli_U00096.3.fasta4,708,032BACE. coli K12; "Rosetta Stone" of bacterial genomics
F30F30_EUK_Ce_chrI_NC_003279.8.fasta15,323,699EUKRoundworm Chromosome I; transition to multicellularity
F31F31_EUK_Ce_chrX_NC_003284.9.fasta18,014,315EUKRoundworm Chromosome X
F32F32_EUK_Ce_chrV_NC_003283.11.fasta21,272,974EUKRoundworm Chromosome V
F33F33_EUK_Os_chr10_NC_008391.2.fasta23,594,136EUKRice (Oryza sativa) Chromosome 10; plant genome
F34F34_EUK_At_chr5_NC_003076.8.fasta27,425,149EUKArabidopsis Chromosome 5
F35F35_EUK_At_chr1_NC_003070.9.fasta30,934,854EUKArabidopsis Chromosome 1; model plant architecture
F36F36_EUK_Hs_chr21_NC_000021.9.fasta47,488,540EUKHuman Chromosome 21; smallest human autosome
F37F37_EUK_Hs_chr22_NC_000022.11.fasta51,665,500EUKHuman Chromosome 22
F38F38_EUK_Hs_chrY_CM000686.2_sample.fasta58,181,267EUKHuman Y Chromosome; unique repeat structures
F39F39_EUK_Hs_chr19_NC_000019.10.fasta59,594,634EUKHuman Chromosome 19; gene-dense
F40F40_EUK_Hs_chr20_NC_000020.11.fasta65,518,294EUKHuman Chromosome 20
F41F41_EUK_Hs_chr18_NC_000018.10.fasta81,712,897EUKHuman Chromosome 18
F42F42_EUK_Hs_chr17_NC_000017.11.fasta84,645,123EUKHuman Chromosome 17
F43F43_EUK_Hs_chr16_NC_000016.10.fasta91,844,042EUKHuman Chromosome 16
F44F44_EUK_Hs_chr15_NC_000015.10.fasta103,691,101EUKHuman Chromosome 15
F45F45_EUK_Hs_chr14_NC_000014.9.fasta108,827,838EUKHuman Chromosome 14
F46F46_EUK_Hs_chr13_NC_000013.11.fasta116,270,459EUKHuman Chromosome 13
F47F47_EUK_Hs_chr12_NC_000012.12.fasta135,496,623EUKHuman Chromosome 12
F48F48_EUK_Hs_chr10_NC_000010.11.fasta136,027,438EUKHuman Chromosome 10
F49F49_EUK_Hs_chr11_NC_000011.10.fasta137,338,124EUKHuman Chromosome 11
F50F50_EUK_Hs_chr09_NC_000009.12.fasta140,701,352EUKHuman Chromosome 9; maximum complexity stress test
Research Design

Why These 50 Genomes?

📊
Design Principle
Entropy Gradient
Measure how "mathematical randomness" increases from F21 (a minimalist, human-designed synthetic genome) to F50 (a massive, evolved human chromosome full of "junk" DNA). The collection spans 5 orders of magnitude in size.
🌳
Taxonomy
All Three Domains of Life
Representatives from Bacteria, Archaea, and Eukaryota, plus Viruses and Transcripts. Ensures mathematical patterns are tested across the broadest possible phylogenetic diversity.
Key Boundary
Complexity Transitions
The jump between F29 (E. coli, 4.7 MB) and F30 (Roundworm chr I, 15.3 MB) is the most significant mathematical boundary — the prokaryote-to-eukaryote transition where genome organization fundamentally changes.
Control Pair: F23 & F24

Both are Yeast Chromosome VII (NC_001139.9) — identical 1,109,182-byte files. This deliberate duplicate ensures every encoder produces byte-for-byte identical output on identical input, validating deterministic behavior across all benchmark runs.

Navigation

Related Reports