8Z · MP3 Resurrection Project

Section 01

The Problem — What MP3 Destroyed

The original recordings were captured at 44.1 kHz, 16-bit — CD quality. They were then encoded to MP3, likely at 128–256 kbps. That encoding process made a series of permanent, one-way decisions about what information was "expendable."

🎵

High Frequencies Discarded

MP3 aggressively cuts content above 16–18 kHz. Cymbals, string harmonics, the "air" of a recording, the sibilance of vocals — all truncated or smeared by codec artifacts.

Partial recovery possible via AI

🥁

Transient Smearing

Sharp attack sounds — snare hits, plucked strings, consonants — are blurred by MP3's block-based transform (MDCT). The "snap" of a drum becomes a slightly softer thud.

Limited recovery

🎸

Pre-echo Artifacts

A characteristic MP3 defect where a faint "ghost" of a transient appears slightly before the hit. Particularly audible in guitar attacks and percussion in quiet passages.

Detectable, partially fixable

🎹

Bitrate Noise Floor

Low-bitrate MP3s introduce a faint but audible noise floor — a mushy, granular texture underneath the music. Especially noticeable in quiet passages and reverb tails.

Good recovery via DeepFilter

🎤

Stereo Image Collapse

MP3 uses "joint stereo" which can collapse the stereo field at lower bitrates, making everything sound narrower and more mono-like than the original mix.

AI mastering can help

📉

Dynamic Range Compression

The encode/decode cycle introduces subtle changes to the amplitude envelope. Combined with the noise floor, the dynamics feel slightly flattened compared to the original.

Addressable at mastering stage

Critical reality check: MP3 encoding is a lossy, one-way compression. The original information is permanently gone. AI tools do not "recover" the original — they reconstruct plausible approximations of what was likely there, using models trained on uncompressed audio. The result is a high-quality reconstruction, not a restoration of truth.

Section 02

What We Have Now — Current Tools

mp3wav_cla.py — Running

mp3wav_gem.py — Available

Batch of 10 songs — In progress

Phase 2 tools — Not built yet

mp3wav_cla.py Claude · v1.3.9 · Primary

A full-featured AI restoration pipeline written by Claude for Bojan. Takes MP3 files and produces multiple comparison variants side-by-side, so you can A/B test which processing chain sounds best for each piece of material.

Variant Folder	What It Does	Status
01_baseline/	Straight MP3 → 24-bit WAV/FLAC decode. Reference track — no AI.	✅ Working
02_denoised/	DeepFilterNet noise suppression. Risky on music — can strip harmonics. Requires Rust toolchain.	⚠️ Disabled (Rust)
03_upscaled/	AudioSR super-resolution directly on the decoded MP3. Reconstructs high-frequency content.	✅ Running
04_upscaled_lp/	Applies a 16 kHz lowpass filter first, then AudioSR. Gives AI a cleaner base to work from.	✅ Running
05_full_chain/	DeepFilterNet → AudioSR combo. Best expected quality. Requires DeepFilterNet to be enabled.	⚠️ Skipped (no DF)

Key engineering decisions in this tool: resumable runs with .done JSON markers, atomic FLAC writes (encode to .tmp, rename only on success), VRAM-adaptive chunking (auto-detects GPU and picks safe chunk sizes), crossfade stitching (2s overlap to eliminate boundary clicks), and GPU → CPU fallback if CUDA OOMs.

mp3wav_gem.py Gemini · Simpler · HiFi-GAN

Gemini's alternative implementation. Uses HiFi-GAN BWE (Bandwidth Extender) instead of AudioSR — a different neural architecture for high-frequency reconstruction. Key differences: processes audio in-process (no subprocess spawning, no model reload per chunk), uses 12-second chunks with crossfade, simpler single-output structure.

Important difference: HiFi-GAN BWE runs the model in-process, loading it once and passing all chunks through it in memory. AudioSR spawns a new subprocess per chunk, reloading the 258M-parameter model each time. HiFi-GAN is therefore much faster per song but may produce lower quality reconstruction than AudioSR's diffusion-based approach.

AudioSR Performance Reality RTX 3080 Mobile · 8 GB VRAM

The RTX 3080 Mobile in the Razer Blade Advanced 2021 runs at 105W TDP — roughly equivalent to a desktop RTX 3060/3070 in real compute throughput. AudioSR's 50-step DDIM diffusion sampling takes ~14–20 minutes per 60-second chunk on this hardware. A 9-minute song processed twice (direct + lowpass variant) takes approximately 6–8 hours of GPU time.

Known bug — input path: The batch file passes an ANSI escape code (\x1b[90m) as the input argument. The script catches this and falls back to the script directory, but it should be fixed in the batch file by stripping color codes before passing the path argument.

Section 03

Understanding the Gaps — All Dimensions of Degradation

AudioSR and HiFi-GAN BWE address only one dimension of what MP3 encoding damaged: the high-frequency spectrum. A complete restoration needs to address every layer of the signal.

🔊

Frequency Spectrum

High frequencies (8–24 kHz): Reconstructed by AudioSR / HiFi-GAN. Cymbals, air, shimmer.

Bass (20–200 Hz): MP3 is actually fairly good here, but the codec noise floor affects the low end too. Needs dedicated bass enhancement or stem processing.

AudioSR handles highs only

⏱️

Temporal Accuracy

Individual musicians can play slightly off the grid. MP3 encoding doesn't fix or worsen this — it's a recording characteristic. But once stems are separated, timing correction on individual instruments is possible.

Requires stem separation first

🎚️

Dynamics & Loudness

The overall loudness balance, compression, and dynamic range of the mixed track. AI mastering tools can intelligently adjust these to modern standards without squashing the life out of the music.

AI mastering stage

🌐

Stereo Field

Width, depth, and placement of instruments in the stereo image. MP3 joint stereo can narrow this. AI tools and mid-side processing can intelligently widen and restore spatial character.

Mastering + stem processing

🎭

Codec Artifacts

The distinctive "MP3 sound" — pre-echo, ringing, granular noise in reverb tails. These are not simply noise; they are structured artifacts that require specialized removal tools.

DeepFilterNet / iZotope RX

🎵

Pitch Accuracy

MP3 does not affect pitch — this is a recording characteristic. However, auto-tune or pitch correction on vocals or lead instruments is possible once stems are separated.

Optional, post-separation

Section 04

The Full AI Toolkit — What Each Tool Does

Tier 1 — Spectrum Restoration Currently Active

Tool	Model Type	Speed	Quality	Notes
AudioSR	Diffusion (50-step DDIM)	~15 min / 60s chunk	⭐⭐⭐⭐⭐	Best quality, slowest. 258M params. Subprocess per chunk = model reload overhead.
HiFi-GAN BWE	GAN (single pass)	~30s / 12s chunk	⭐⭐⭐⭐	Much faster, good quality. In-process, no reload. Gemini's choice. 48kHz output.
DeepFilterNet 3	RNN + spectral	Real-time	⭐⭐⭐	Noise suppression, not upscaling. Speech-trained — can damage music harmonics. Use carefully.

Tier 2 — Stem Separation Phase 2 · Critical Enabler

Stem separation is the key unlocking step for the entire advanced pipeline. Once a song is split into individual stems, every other processing step becomes dramatically more effective and controllable — you process each instrument optimally, then remix.

Tool	Stems	Quality	Speed	Notes
Demucs htdemucs_ft	Vocals, Drums, Bass, Other	⭐⭐⭐⭐⭐	~2–5 min/song GPU	Meta AI. State of the art. Free, open source. pip install demucs. Best overall.
Demucs htdemucs_6s	+ Guitar, Piano (6 stems)	⭐⭐⭐⭐	~3–6 min/song GPU	6-stem variant. More useful for complex arrangements. Slightly lower per-stem quality.
Spleeter (Deezer)	2, 4, or 5 stems	⭐⭐⭐	Very fast	Older, lower quality than Demucs. Not recommended for quality-focused work.

On your question about bass: Once Demucs separates the bass stem, you can process it independently — apply HiFi-GAN or AudioSR to the isolated bass track (much better results than on a full mix), optionally apply a bass enhancer, then mix back. The bass in an MP3 is actually less damaged than the highs, but isolation + enhancement still yields audible improvement.

Tier 3 — Per-Stem Processing Phase 2 · Post-Separation

🥁

Drum Timing Correction

Once the drum stem is isolated, transient detection + quantization can tighten a slightly loose drummer. Tools: madmom for beat detection, librosa for time-stretching with phase vocoder.

Phase 2

🎸

Bass Stem Enhancement

Apply AudioSR or HiFi-GAN BWE to the isolated bass stem. Dramatically more effective than full-mix processing because the AI isn't confused by other instruments.

Phase 2

🎤

Vocal Enhancement

Apply DeepFilterNet or RNNoise to the isolated vocal stem (where speech-trained models actually shine). Then AudioSR for high-frequency detail. Optional: subtle pitch correction.

Phase 2

🎹

Harmonic Stem (Other)

Guitars, keyboards, synths — process with AudioSR for high-frequency extension. This stem benefits most from diffusion-based reconstruction of harmonic overtones.

Phase 2

Tier 4 — Mix & Mastering Phase 3

🎛️

Stem Remixing

After processing each stem independently, recombine them using Python's soundfile + numpy. Apply individual volume balancing. The remixed result is often noticeably better than the original mix.

Phase 3

📻

AI Mastering

Final-stage loudness normalization, EQ, stereo width enhancement. Python tools: pyloudnorm for LUFS normalization. Or integrate with LANDR API or iZotope Ozone (if available).

Phase 3

📐

LUFS Normalization

Normalize final output to streaming standard (−14 LUFS integrated). Ensures consistent playback volume across all restored tracks. Free, simple, measurable.

Phase 3

Section 05

The Master Pipeline — Complete Processing Chain

The complete vision — all phases combined — processes each MP3 through a multi-stage chain. Not all stages are required for every song; the tool will offer granular control over which stages to enable.

STEP 01

Decode

MP3 → 24-bit 48 kHz WAV baseline. No AI, just clean decode via FFmpeg.

STEP 02

AudioSR

Full-mix high-frequency super-resolution via diffusion model. Best quality HF reconstruction.

STEP 03

Demucs

Separate into 4 stems: Vocals · Drums · Bass · Other. Applied to AudioSR output.

STEP 04

Per-Stem AI

Each stem processed independently: HiFi-GAN BWE + optional DeepFilter on vocals + timing on drums.

STEP 05

Remix

Recombine processed stems into stereo mix. Balance levels. Apply stereo width enhancement.

STEP 06

Master

LUFS normalization → −14 LUFS. Dynamic range preservation. Final FLAC-12 output at 48 kHz / 24-bit.

The variants model: Rather than picking one "best" chain, the tool produces multiple variants at key decision points so you can compare and choose the best result for each individual song. Music is subjective — what sounds best for a jazz recording differs from what sounds best for heavy metal.

Output File Structure

# Song: "Aircraft Carrier.mp3"

AI_Restored_Variants/
  01_baseline/            ← Clean decode, no AI (reference)
    Aircraft-Carrier.flac

  02_audiosr_direct/      ← AudioSR on full mix (Phase 1 ✓)
    Aircraft-Carrier.up.flac

  03_audiosr_lp/          ← Lowpass → AudioSR (Phase 1 ✓)
    Aircraft-Carrier.lp_up.flac

  04_stems/               ← Demucs 4-stem separation (Phase 2)
    vocals.flac
    drums.flac
    bass.flac
    other.flac

  05_stems_enhanced/      ← Per-stem AI processing (Phase 2)
    vocals.enhanced.flac
    drums.enhanced.flac
    bass.enhanced.flac
    other.enhanced.flac

  06_remixed/             ← Stems recombined (Phase 2)
    Aircraft-Carrier.remix.flac

  07_mastered/            ← LUFS normalized, final (Phase 3)
    Aircraft-Carrier.master.flac

  _done/                  ← Resume markers (JSON fingerprints)
  _logs/                  ← Per-file processing logs

Section 06

The Unified Tool — Architecture & Menu Design

The goal is a single Python script — mp3restore.py — with a colorful interactive batch menu. No Windows GUI. All control via keyboard. Works from CMD or PowerShell. The menu drives everything, with full CLI pass-through for automation.

Proposed Main Menu

══════════════════════════════════════════════════════════════════
   8Z · MP3 Resurrection Pipeline   v2.0  ·  Bojan Dobrečevič
══════════════════════════════════════════════════════════════════

  Hardware:  CUDA  ·  RTX 3080 Mobile  ·  8.0 GB VRAM
  Workers:   16 CPU threads
  Folder:    D:\8Z\8za2flac\Someone  ·  10 MP3 files

──────────────────────────────────────────────────────────────────
  [1]  Phase 1 — AudioSR Super-Resolution     (high freq, ~8h/song)
  [2]  Phase 2 — Stem Separation + Per-Stem AI (requires Phase 1)
  [3]  Phase 3 — Remix + LUFS Mastering       (requires Phase 2)
  [4]  Full Pipeline  (1 → 2 → 3, overnight)
──────────────────────────────────────────────────────────────────
  [5]  HiFi-GAN Quick Pass                    (fast, ~30s/song)
  [6]  DeepFilterNet Noise Reduction          (music: use with care)
  [7]  Compare Variants                       (show side-by-side stats)
──────────────────────────────────────────────────────────────────
  [8]  Settings    [9]  File Selector    [0]  Exit
══════════════════════════════════════════════════════════════════
  ▸ Choice: _

Module Architecture

🧱

mp3restore.py — Main entry

Menu, CLI argument parsing, hardware detection, file scanning, progress tracking, resume logic. All color output. Imports the modules below.

Core

🔊

modules/audiosr_runner.py

Wraps AudioSR CLI with chunking, crossfade stitching, GPU→CPU fallback. Carries forward all logic from current mp3wav_cla.py.

Phase 1 done

⚡

modules/hifigan_runner.py

In-process HiFi-GAN BWE. Fast alternative to AudioSR. Model loaded once, all chunks processed without subprocess overhead.

Phase 1 done (gem)

🎭

modules/demucs_runner.py

Wraps Demucs htdemucs_ft. Splits each song to 4 stems, saves as individual 24-bit FLACs. The critical enabler for all Phase 2 work.

Phase 2 — to build

🎛️

modules/stem_processor.py

Per-stem AI enhancement. Routes each stem to the optimal model: vocals → DeepFilter+AudioSR, bass/other → HiFi-GAN, drums → transient sharpening.

Phase 2 — to build

🎚️

modules/mixer_master.py

Recombines stems, applies LUFS normalization (pyloudnorm), stereo width enhancement, final FLAC-12 encode. Target: −14 LUFS for streaming.

Phase 3 — to build

Key Engineering Principles

Resumable always: Every song gets a .done.json fingerprint. Crash → restart → continues where it left off automatically.
Atomic writes: Every output file written as .tmp first. Renamed to final only on verified success. No half-written files.
Variants, not overwrites: Every processing step produces a named variant alongside the input. You always have the baseline. You always have the previous step's output.
One model load per session: For in-process models (HiFi-GAN, Demucs), load once at the start of a batch and keep in VRAM for all songs. Kill the subprocess-per-chunk pattern.
16 workers: Use all CPU cores for FFmpeg operations, chunking, FLAC encoding. AI inference on GPU. Parallelism where possible.
Verbose by default: All steps timestamped. Progress spinners with elapsed time. No silent hangs. Flush stdout on every line.

Section 07

Roadmap — Build Order & Priorities

#	Task	Phase	Effort	Status
01	Fix batch file ANSI escape code input bug	1	Trivial (5 min)	Quick fix
02	Fix AudioSR model-reload-per-chunk (in-process wrapper)	1	Medium (1–2h)	High value
03	Finish current 10-song AudioSR batch	1	~3–4 days GPU	Running
04	Install & test Demucs htdemucs_ft	2	1–2h setup	Next up
05	Build demucs_runner.py module	2	Half day	Planned
06	Build stem_processor.py — per-stem routing logic	2	1 day	Planned
07	Build mixer_master.py — recombine + LUFS normalize	3	Half day	Future
08	Unified mp3restore.py with full interactive menu	All	1–2 days	Future
09	Drum timing correction module (madmom + librosa)	2+	1 day R&D	Research
10	Batch compare/report — side-by-side variant quality metrics	3	Half day	Future

Immediate Next Step

Fix the model-reload bug first. The single biggest quality-of-life improvement available right now — before Phase 2 even starts — is rewriting AudioSR to run in-process like HiFi-GAN does in mp3wav_gem.py. This would cut the per-song time roughly in half by eliminating ~10–15 seconds of Python startup + model loading per chunk. On a 10-chunk song that's 100–150 seconds of pure overhead. For the full 10-song batch it could save 20–30 minutes of total runtime at no quality cost.

Section 08

Honest Limits — What AI Cannot Recover

This is important context for managing expectations. AI audio tools are impressive, but they operate under fundamental constraints.

Reconstruction is not recovery. When AudioSR "restores" high frequencies, it is generating statistically plausible audio based on patterns learned from training data. The output is not the original recording — it is a new signal that sounds like what the original probably sounded like. On average this is excellent. For specific moments (a particular cymbal hit, a guitar harmonic) it may be wrong in ways that are subtly audible to trained ears.

❌

Cannot: Recover specific lost information

If a specific note's overtones were discarded by MP3, no AI can know what those specific overtones were. It will reconstruct something plausible, not the original truth.

❌

Cannot: Fix poor original recordings

If the recording itself was poorly made — bad mic placement, room acoustics issues, a musician who was genuinely out of sync — AI restoration cannot fix the underlying recording problem.

⚠️

Risk: Hallucination artifacts

Diffusion models can occasionally generate high-frequency content that wasn't in the original at all — musical artifacts that are plausible but wrong. A/B comparison with the baseline is always necessary.

⚠️

Risk: Stem bleed

Demucs is remarkable but not perfect. Some bass frequencies appear in the "other" stem, some drum reverb bleeds into vocals. Per-stem processing must account for this bleed.

⚠️

Risk: DeepFilterNet on music

This model was trained on speech. It may identify musical harmonics and overtones as "noise" and reduce them. Only safe to use on isolated vocal stems, not full mixes. Test carefully.

✅

The honest bottom line

The output will sound significantly better than the original MP3 to most listeners on most material. It will not be identical to the lost original. It is the best reconstruction currently possible.

Conclusion

The MP3 Resurrection Project is a serious, well-motivated application of current AI audio technology. The foundation — mp3wav_cla.py with AudioSR — is already producing high-quality results. Phase 2 (Demucs stem separation + per-stem enhancement) will unlock a qualitatively different level of restoration by allowing targeted AI processing of each instrument independently. Phase 3 (remix + mastering) will bring the final output to modern streaming quality.

The most important discipline throughout is critical listening with A/B comparison. Every variant produced is a hypothesis. The human ear is the final judge.

To continue in a new session: Open this HTML file, share it with Claude, and say: "We made this so far — look at the paper and let's continue." All context, decisions, tool choices, and the build plan are documented here.

MP3 ResurrectionProject

Contents

The Problem — What MP3 Destroyed

High Frequencies Discarded

Transient Smearing

Pre-echo Artifacts

Bitrate Noise Floor

Stereo Image Collapse

Dynamic Range Compression

What We Have Now — Current Tools

mp3wav_cla.py Claude · v1.3.9 · Primary

mp3wav_gem.py Gemini · Simpler · HiFi-GAN

AudioSR Performance Reality RTX 3080 Mobile · 8 GB VRAM

Understanding the Gaps — All Dimensions of Degradation

Frequency Spectrum

Temporal Accuracy

Dynamics & Loudness

Stereo Field

Codec Artifacts

Pitch Accuracy

The Full AI Toolkit — What Each Tool Does

Tier 1 — Spectrum Restoration Currently Active

Tier 2 — Stem Separation Phase 2 · Critical Enabler

Tier 3 — Per-Stem Processing Phase 2 · Post-Separation

Drum Timing Correction

Bass Stem Enhancement

Vocal Enhancement

Harmonic Stem (Other)

Tier 4 — Mix & Mastering Phase 3

Stem Remixing

AI Mastering

LUFS Normalization

The Master Pipeline — Complete Processing Chain

Output File Structure

The Unified Tool — Architecture & Menu Design

Proposed Main Menu

Module Architecture

mp3restore.py — Main entry

modules/audiosr_runner.py

modules/hifigan_runner.py

modules/demucs_runner.py

modules/stem_processor.py

modules/mixer_master.py

Key Engineering Principles

Roadmap — Build Order & Priorities

Immediate Next Step

Honest Limits — What AI Cannot Recover

Cannot: Recover specific lost information

Cannot: Fix poor original recordings

Risk: Hallucination artifacts

Risk: Stem bleed

Risk: DeepFilterNet on music

The honest bottom line

Conclusion

MP3 Resurrection
Project