8Z Research · Bojan Dobrečevič · AIM³ Institute Ljubljana

MP3 Resurrection
Project

Recovering a lifetime of music from lossy MP3 archives — using AI super-resolution, stem separation, and intelligent batch processing to rebuild what compression discarded.

Status: Active Development  ·  Phase: 1 of 3 (AudioSR Pipeline running)  ·  Hardware: RTX 3080 Mobile · 8 GB VRAM · 16 CPU workers

Contents

  1. 01 The Problem — What MP3 Destroyed
  2. 02 What We Have Now — Current Tools
  3. 03 Understanding the Gaps — All Dimensions of Degradation
  4. 04 The Full AI Toolkit — What Each Tool Does
  5. 05 The Master Pipeline — Complete Processing Chain
  6. 06 The Unified Tool — Architecture & Menu Design
  7. 07 Roadmap — Build Order & Priorities
  8. 08 Honest Limits — What AI Cannot Recover

The Problem — What MP3 Destroyed

The original recordings were captured at 44.1 kHz, 16-bit — CD quality. They were then encoded to MP3, likely at 128–256 kbps. That encoding process made a series of permanent, one-way decisions about what information was "expendable."

🎵

High Frequencies Discarded

MP3 aggressively cuts content above 16–18 kHz. Cymbals, string harmonics, the "air" of a recording, the sibilance of vocals — all truncated or smeared by codec artifacts.

Partial recovery possible via AI
🥁

Transient Smearing

Sharp attack sounds — snare hits, plucked strings, consonants — are blurred by MP3's block-based transform (MDCT). The "snap" of a drum becomes a slightly softer thud.

Limited recovery
🎸

Pre-echo Artifacts

A characteristic MP3 defect where a faint "ghost" of a transient appears slightly before the hit. Particularly audible in guitar attacks and percussion in quiet passages.

Detectable, partially fixable
🎹

Bitrate Noise Floor

Low-bitrate MP3s introduce a faint but audible noise floor — a mushy, granular texture underneath the music. Especially noticeable in quiet passages and reverb tails.

Good recovery via DeepFilter
🎤

Stereo Image Collapse

MP3 uses "joint stereo" which can collapse the stereo field at lower bitrates, making everything sound narrower and more mono-like than the original mix.

AI mastering can help
📉

Dynamic Range Compression

The encode/decode cycle introduces subtle changes to the amplitude envelope. Combined with the noise floor, the dynamics feel slightly flattened compared to the original.

Addressable at mastering stage

Critical reality check: MP3 encoding is a lossy, one-way compression. The original information is permanently gone. AI tools do not "recover" the original — they reconstruct plausible approximations of what was likely there, using models trained on uncompressed audio. The result is a high-quality reconstruction, not a restoration of truth.


What We Have Now — Current Tools

mp3wav_cla.py — Running
mp3wav_gem.py — Available
Batch of 10 songs — In progress
Phase 2 tools — Not built yet

mp3wav_cla.py Claude · v1.3.9 · Primary

A full-featured AI restoration pipeline written by Claude for Bojan. Takes MP3 files and produces multiple comparison variants side-by-side, so you can A/B test which processing chain sounds best for each piece of material.

Variant FolderWhat It DoesStatus
01_baseline/Straight MP3 → 24-bit WAV/FLAC decode. Reference track — no AI.✅ Working
02_denoised/DeepFilterNet noise suppression. Risky on music — can strip harmonics. Requires Rust toolchain.⚠️ Disabled (Rust)
03_upscaled/AudioSR super-resolution directly on the decoded MP3. Reconstructs high-frequency content.✅ Running
04_upscaled_lp/Applies a 16 kHz lowpass filter first, then AudioSR. Gives AI a cleaner base to work from.✅ Running
05_full_chain/DeepFilterNet → AudioSR combo. Best expected quality. Requires DeepFilterNet to be enabled.⚠️ Skipped (no DF)

Key engineering decisions in this tool: resumable runs with .done JSON markers, atomic FLAC writes (encode to .tmp, rename only on success), VRAM-adaptive chunking (auto-detects GPU and picks safe chunk sizes), crossfade stitching (2s overlap to eliminate boundary clicks), and GPU → CPU fallback if CUDA OOMs.

mp3wav_gem.py Gemini · Simpler · HiFi-GAN

Gemini's alternative implementation. Uses HiFi-GAN BWE (Bandwidth Extender) instead of AudioSR — a different neural architecture for high-frequency reconstruction. Key differences: processes audio in-process (no subprocess spawning, no model reload per chunk), uses 12-second chunks with crossfade, simpler single-output structure.

Important difference: HiFi-GAN BWE runs the model in-process, loading it once and passing all chunks through it in memory. AudioSR spawns a new subprocess per chunk, reloading the 258M-parameter model each time. HiFi-GAN is therefore much faster per song but may produce lower quality reconstruction than AudioSR's diffusion-based approach.

AudioSR Performance Reality RTX 3080 Mobile · 8 GB VRAM

The RTX 3080 Mobile in the Razer Blade Advanced 2021 runs at 105W TDP — roughly equivalent to a desktop RTX 3060/3070 in real compute throughput. AudioSR's 50-step DDIM diffusion sampling takes ~14–20 minutes per 60-second chunk on this hardware. A 9-minute song processed twice (direct + lowpass variant) takes approximately 6–8 hours of GPU time.

Known bug — input path: The batch file passes an ANSI escape code (\x1b[90m) as the input argument. The script catches this and falls back to the script directory, but it should be fixed in the batch file by stripping color codes before passing the path argument.


Understanding the Gaps — All Dimensions of Degradation

AudioSR and HiFi-GAN BWE address only one dimension of what MP3 encoding damaged: the high-frequency spectrum. A complete restoration needs to address every layer of the signal.

🔊

Frequency Spectrum

High frequencies (8–24 kHz): Reconstructed by AudioSR / HiFi-GAN. Cymbals, air, shimmer.

Bass (20–200 Hz): MP3 is actually fairly good here, but the codec noise floor affects the low end too. Needs dedicated bass enhancement or stem processing.

AudioSR handles highs only
⏱️

Temporal Accuracy

Individual musicians can play slightly off the grid. MP3 encoding doesn't fix or worsen this — it's a recording characteristic. But once stems are separated, timing correction on individual instruments is possible.

Requires stem separation first
🎚️

Dynamics & Loudness

The overall loudness balance, compression, and dynamic range of the mixed track. AI mastering tools can intelligently adjust these to modern standards without squashing the life out of the music.

AI mastering stage
🌐

Stereo Field

Width, depth, and placement of instruments in the stereo image. MP3 joint stereo can narrow this. AI tools and mid-side processing can intelligently widen and restore spatial character.

Mastering + stem processing
🎭

Codec Artifacts

The distinctive "MP3 sound" — pre-echo, ringing, granular noise in reverb tails. These are not simply noise; they are structured artifacts that require specialized removal tools.

DeepFilterNet / iZotope RX
🎵

Pitch Accuracy

MP3 does not affect pitch — this is a recording characteristic. However, auto-tune or pitch correction on vocals or lead instruments is possible once stems are separated.

Optional, post-separation

The Full AI Toolkit — What Each Tool Does

Tier 1 — Spectrum Restoration Currently Active

ToolModel TypeSpeedQualityNotes
AudioSR Diffusion (50-step DDIM) ~15 min / 60s chunk ⭐⭐⭐⭐⭐ Best quality, slowest. 258M params. Subprocess per chunk = model reload overhead.
HiFi-GAN BWE GAN (single pass) ~30s / 12s chunk ⭐⭐⭐⭐ Much faster, good quality. In-process, no reload. Gemini's choice. 48kHz output.
DeepFilterNet 3 RNN + spectral Real-time ⭐⭐⭐ Noise suppression, not upscaling. Speech-trained — can damage music harmonics. Use carefully.

Tier 2 — Stem Separation Phase 2 · Critical Enabler

Stem separation is the key unlocking step for the entire advanced pipeline. Once a song is split into individual stems, every other processing step becomes dramatically more effective and controllable — you process each instrument optimally, then remix.

ToolStemsQualitySpeedNotes
Demucs htdemucs_ft Vocals, Drums, Bass, Other ⭐⭐⭐⭐⭐ ~2–5 min/song GPU Meta AI. State of the art. Free, open source. pip install demucs. Best overall.
Demucs htdemucs_6s + Guitar, Piano (6 stems) ⭐⭐⭐⭐ ~3–6 min/song GPU 6-stem variant. More useful for complex arrangements. Slightly lower per-stem quality.
Spleeter (Deezer) 2, 4, or 5 stems ⭐⭐⭐ Very fast Older, lower quality than Demucs. Not recommended for quality-focused work.

On your question about bass: Once Demucs separates the bass stem, you can process it independently — apply HiFi-GAN or AudioSR to the isolated bass track (much better results than on a full mix), optionally apply a bass enhancer, then mix back. The bass in an MP3 is actually less damaged than the highs, but isolation + enhancement still yields audible improvement.

Tier 3 — Per-Stem Processing Phase 2 · Post-Separation

🥁

Drum Timing Correction

Once the drum stem is isolated, transient detection + quantization can tighten a slightly loose drummer. Tools: madmom for beat detection, librosa for time-stretching with phase vocoder.

Phase 2
🎸

Bass Stem Enhancement

Apply AudioSR or HiFi-GAN BWE to the isolated bass stem. Dramatically more effective than full-mix processing because the AI isn't confused by other instruments.

Phase 2
🎤

Vocal Enhancement

Apply DeepFilterNet or RNNoise to the isolated vocal stem (where speech-trained models actually shine). Then AudioSR for high-frequency detail. Optional: subtle pitch correction.

Phase 2
🎹

Harmonic Stem (Other)

Guitars, keyboards, synths — process with AudioSR for high-frequency extension. This stem benefits most from diffusion-based reconstruction of harmonic overtones.

Phase 2

Tier 4 — Mix & Mastering Phase 3

🎛️

Stem Remixing

After processing each stem independently, recombine them using Python's soundfile + numpy. Apply individual volume balancing. The remixed result is often noticeably better than the original mix.

Phase 3
📻

AI Mastering

Final-stage loudness normalization, EQ, stereo width enhancement. Python tools: pyloudnorm for LUFS normalization. Or integrate with LANDR API or iZotope Ozone (if available).

Phase 3
📐

LUFS Normalization

Normalize final output to streaming standard (−14 LUFS integrated). Ensures consistent playback volume across all restored tracks. Free, simple, measurable.

Phase 3

The Master Pipeline — Complete Processing Chain

The complete vision — all phases combined — processes each MP3 through a multi-stage chain. Not all stages are required for every song; the tool will offer granular control over which stages to enable.

STEP 01
Decode
MP3 → 24-bit 48 kHz WAV baseline. No AI, just clean decode via FFmpeg.
STEP 02
AudioSR
Full-mix high-frequency super-resolution via diffusion model. Best quality HF reconstruction.
STEP 03
Demucs
Separate into 4 stems: Vocals · Drums · Bass · Other. Applied to AudioSR output.
STEP 04
Per-Stem AI
Each stem processed independently: HiFi-GAN BWE + optional DeepFilter on vocals + timing on drums.
STEP 05
Remix
Recombine processed stems into stereo mix. Balance levels. Apply stereo width enhancement.
STEP 06
Master
LUFS normalization → −14 LUFS. Dynamic range preservation. Final FLAC-12 output at 48 kHz / 24-bit.

The variants model: Rather than picking one "best" chain, the tool produces multiple variants at key decision points so you can compare and choose the best result for each individual song. Music is subjective — what sounds best for a jazz recording differs from what sounds best for heavy metal.

Output File Structure

# Song: "Aircraft Carrier.mp3"

AI_Restored_Variants/
  01_baseline/            ← Clean decode, no AI (reference)
    Aircraft-Carrier.flac

  02_audiosr_direct/      ← AudioSR on full mix (Phase 1 ✓)
    Aircraft-Carrier.up.flac

  03_audiosr_lp/          ← Lowpass → AudioSR (Phase 1 ✓)
    Aircraft-Carrier.lp_up.flac

  04_stems/               ← Demucs 4-stem separation (Phase 2)
    vocals.flac
    drums.flac
    bass.flac
    other.flac

  05_stems_enhanced/      ← Per-stem AI processing (Phase 2)
    vocals.enhanced.flac
    drums.enhanced.flac
    bass.enhanced.flac
    other.enhanced.flac

  06_remixed/             ← Stems recombined (Phase 2)
    Aircraft-Carrier.remix.flac

  07_mastered/            ← LUFS normalized, final (Phase 3)
    Aircraft-Carrier.master.flac

  _done/                  ← Resume markers (JSON fingerprints)
  _logs/                  ← Per-file processing logs

The Unified Tool — Architecture & Menu Design

The goal is a single Python script — mp3restore.py — with a colorful interactive batch menu. No Windows GUI. All control via keyboard. Works from CMD or PowerShell. The menu drives everything, with full CLI pass-through for automation.

Proposed Main Menu

Module Architecture

🧱

mp3restore.py — Main entry

Menu, CLI argument parsing, hardware detection, file scanning, progress tracking, resume logic. All color output. Imports the modules below.

Core
🔊

modules/audiosr_runner.py

Wraps AudioSR CLI with chunking, crossfade stitching, GPU→CPU fallback. Carries forward all logic from current mp3wav_cla.py.

Phase 1 done

modules/hifigan_runner.py

In-process HiFi-GAN BWE. Fast alternative to AudioSR. Model loaded once, all chunks processed without subprocess overhead.

Phase 1 done (gem)
🎭

modules/demucs_runner.py

Wraps Demucs htdemucs_ft. Splits each song to 4 stems, saves as individual 24-bit FLACs. The critical enabler for all Phase 2 work.

Phase 2 — to build
🎛️

modules/stem_processor.py

Per-stem AI enhancement. Routes each stem to the optimal model: vocals → DeepFilter+AudioSR, bass/other → HiFi-GAN, drums → transient sharpening.

Phase 2 — to build
🎚️

modules/mixer_master.py

Recombines stems, applies LUFS normalization (pyloudnorm), stereo width enhancement, final FLAC-12 encode. Target: −14 LUFS for streaming.

Phase 3 — to build

Key Engineering Principles


Roadmap — Build Order & Priorities

#TaskPhaseEffortStatus
01 Fix batch file ANSI escape code input bug 1 Trivial (5 min) Quick fix
02 Fix AudioSR model-reload-per-chunk (in-process wrapper) 1 Medium (1–2h) High value
03 Finish current 10-song AudioSR batch 1 ~3–4 days GPU Running
04 Install & test Demucs htdemucs_ft 2 1–2h setup Next up
05 Build demucs_runner.py module 2 Half day Planned
06 Build stem_processor.py — per-stem routing logic 2 1 day Planned
07 Build mixer_master.py — recombine + LUFS normalize 3 Half day Future
08 Unified mp3restore.py with full interactive menu All 1–2 days Future
09 Drum timing correction module (madmom + librosa) 2+ 1 day R&D Research
10 Batch compare/report — side-by-side variant quality metrics 3 Half day Future

Immediate Next Step

Fix the model-reload bug first. The single biggest quality-of-life improvement available right now — before Phase 2 even starts — is rewriting AudioSR to run in-process like HiFi-GAN does in mp3wav_gem.py. This would cut the per-song time roughly in half by eliminating ~10–15 seconds of Python startup + model loading per chunk. On a 10-chunk song that's 100–150 seconds of pure overhead. For the full 10-song batch it could save 20–30 minutes of total runtime at no quality cost.


Honest Limits — What AI Cannot Recover

This is important context for managing expectations. AI audio tools are impressive, but they operate under fundamental constraints.

Reconstruction is not recovery. When AudioSR "restores" high frequencies, it is generating statistically plausible audio based on patterns learned from training data. The output is not the original recording — it is a new signal that sounds like what the original probably sounded like. On average this is excellent. For specific moments (a particular cymbal hit, a guitar harmonic) it may be wrong in ways that are subtly audible to trained ears.

Cannot: Recover specific lost information

If a specific note's overtones were discarded by MP3, no AI can know what those specific overtones were. It will reconstruct something plausible, not the original truth.

Cannot: Fix poor original recordings

If the recording itself was poorly made — bad mic placement, room acoustics issues, a musician who was genuinely out of sync — AI restoration cannot fix the underlying recording problem.

⚠️

Risk: Hallucination artifacts

Diffusion models can occasionally generate high-frequency content that wasn't in the original at all — musical artifacts that are plausible but wrong. A/B comparison with the baseline is always necessary.

⚠️

Risk: Stem bleed

Demucs is remarkable but not perfect. Some bass frequencies appear in the "other" stem, some drum reverb bleeds into vocals. Per-stem processing must account for this bleed.

⚠️

Risk: DeepFilterNet on music

This model was trained on speech. It may identify musical harmonics and overtones as "noise" and reduce them. Only safe to use on isolated vocal stems, not full mixes. Test carefully.

The honest bottom line

The output will sound significantly better than the original MP3 to most listeners on most material. It will not be identical to the lost original. It is the best reconstruction currently possible.

Conclusion

The MP3 Resurrection Project is a serious, well-motivated application of current AI audio technology. The foundation — mp3wav_cla.py with AudioSR — is already producing high-quality results. Phase 2 (Demucs stem separation + per-stem enhancement) will unlock a qualitatively different level of restoration by allowing targeted AI processing of each instrument independently. Phase 3 (remix + mastering) will bring the final output to modern streaming quality.

The most important discipline throughout is critical listening with A/B comparison. Every variant produced is a hypothesis. The human ear is the final judge.

To continue in a new session: Open this HTML file, share it with Claude, and say: "We made this so far — look at the paper and let's continue." All context, decisions, tool choices, and the build plan are documented here.

8Z Research · AIM³ Institute · Ljubljana, Slovenia · 2026
Authored collaboratively: Bojan Dobrečevič + Claude (Anthropic) · February 26, 2026