Recovering a lifetime of music from lossy MP3 archives — using AI super-resolution, stem separation, and intelligent batch processing to rebuild what compression discarded.
The original recordings were captured at 44.1 kHz, 16-bit — CD quality. They were then encoded to MP3, likely at 128–256 kbps. That encoding process made a series of permanent, one-way decisions about what information was "expendable."
MP3 aggressively cuts content above 16–18 kHz. Cymbals, string harmonics, the "air" of a recording, the sibilance of vocals — all truncated or smeared by codec artifacts.
Partial recovery possible via AISharp attack sounds — snare hits, plucked strings, consonants — are blurred by MP3's block-based transform (MDCT). The "snap" of a drum becomes a slightly softer thud.
Limited recoveryA characteristic MP3 defect where a faint "ghost" of a transient appears slightly before the hit. Particularly audible in guitar attacks and percussion in quiet passages.
Detectable, partially fixableLow-bitrate MP3s introduce a faint but audible noise floor — a mushy, granular texture underneath the music. Especially noticeable in quiet passages and reverb tails.
Good recovery via DeepFilterMP3 uses "joint stereo" which can collapse the stereo field at lower bitrates, making everything sound narrower and more mono-like than the original mix.
AI mastering can helpThe encode/decode cycle introduces subtle changes to the amplitude envelope. Combined with the noise floor, the dynamics feel slightly flattened compared to the original.
Addressable at mastering stageCritical reality check: MP3 encoding is a lossy, one-way compression. The original information is permanently gone. AI tools do not "recover" the original — they reconstruct plausible approximations of what was likely there, using models trained on uncompressed audio. The result is a high-quality reconstruction, not a restoration of truth.
A full-featured AI restoration pipeline written by Claude for Bojan. Takes MP3 files and produces multiple comparison variants side-by-side, so you can A/B test which processing chain sounds best for each piece of material.
| Variant Folder | What It Does | Status |
|---|---|---|
| 01_baseline/ | Straight MP3 → 24-bit WAV/FLAC decode. Reference track — no AI. | ✅ Working |
| 02_denoised/ | DeepFilterNet noise suppression. Risky on music — can strip harmonics. Requires Rust toolchain. | ⚠️ Disabled (Rust) |
| 03_upscaled/ | AudioSR super-resolution directly on the decoded MP3. Reconstructs high-frequency content. | ✅ Running |
| 04_upscaled_lp/ | Applies a 16 kHz lowpass filter first, then AudioSR. Gives AI a cleaner base to work from. | ✅ Running |
| 05_full_chain/ | DeepFilterNet → AudioSR combo. Best expected quality. Requires DeepFilterNet to be enabled. | ⚠️ Skipped (no DF) |
Key engineering decisions in this tool: resumable runs with .done JSON markers,
atomic FLAC writes (encode to .tmp, rename only on success),
VRAM-adaptive chunking (auto-detects GPU and picks safe chunk sizes),
crossfade stitching (2s overlap to eliminate boundary clicks), and
GPU → CPU fallback if CUDA OOMs.
Gemini's alternative implementation. Uses HiFi-GAN BWE (Bandwidth Extender) instead of AudioSR — a different neural architecture for high-frequency reconstruction. Key differences: processes audio in-process (no subprocess spawning, no model reload per chunk), uses 12-second chunks with crossfade, simpler single-output structure.
Important difference: HiFi-GAN BWE runs the model in-process, loading it once and passing all chunks through it in memory. AudioSR spawns a new subprocess per chunk, reloading the 258M-parameter model each time. HiFi-GAN is therefore much faster per song but may produce lower quality reconstruction than AudioSR's diffusion-based approach.
The RTX 3080 Mobile in the Razer Blade Advanced 2021 runs at 105W TDP — roughly equivalent to a desktop RTX 3060/3070 in real compute throughput. AudioSR's 50-step DDIM diffusion sampling takes ~14–20 minutes per 60-second chunk on this hardware. A 9-minute song processed twice (direct + lowpass variant) takes approximately 6–8 hours of GPU time.
Known bug — input path: The batch file passes an ANSI escape code
(\x1b[90m) as the input argument. The script catches this and falls back
to the script directory, but it should be fixed in the batch file by stripping color
codes before passing the path argument.
AudioSR and HiFi-GAN BWE address only one dimension of what MP3 encoding damaged: the high-frequency spectrum. A complete restoration needs to address every layer of the signal.
High frequencies (8–24 kHz): Reconstructed by AudioSR / HiFi-GAN. Cymbals, air, shimmer.
Bass (20–200 Hz): MP3 is actually fairly good here, but the codec noise floor affects the low end too. Needs dedicated bass enhancement or stem processing.
Individual musicians can play slightly off the grid. MP3 encoding doesn't fix or worsen this — it's a recording characteristic. But once stems are separated, timing correction on individual instruments is possible.
Requires stem separation firstThe overall loudness balance, compression, and dynamic range of the mixed track. AI mastering tools can intelligently adjust these to modern standards without squashing the life out of the music.
AI mastering stageWidth, depth, and placement of instruments in the stereo image. MP3 joint stereo can narrow this. AI tools and mid-side processing can intelligently widen and restore spatial character.
Mastering + stem processingThe distinctive "MP3 sound" — pre-echo, ringing, granular noise in reverb tails. These are not simply noise; they are structured artifacts that require specialized removal tools.
DeepFilterNet / iZotope RXMP3 does not affect pitch — this is a recording characteristic. However, auto-tune or pitch correction on vocals or lead instruments is possible once stems are separated.
Optional, post-separation| Tool | Model Type | Speed | Quality | Notes |
|---|---|---|---|---|
| AudioSR | Diffusion (50-step DDIM) | ~15 min / 60s chunk | ⭐⭐⭐⭐⭐ | Best quality, slowest. 258M params. Subprocess per chunk = model reload overhead. |
| HiFi-GAN BWE | GAN (single pass) | ~30s / 12s chunk | ⭐⭐⭐⭐ | Much faster, good quality. In-process, no reload. Gemini's choice. 48kHz output. |
| DeepFilterNet 3 | RNN + spectral | Real-time | ⭐⭐⭐ | Noise suppression, not upscaling. Speech-trained — can damage music harmonics. Use carefully. |
Stem separation is the key unlocking step for the entire advanced pipeline. Once a song is split into individual stems, every other processing step becomes dramatically more effective and controllable — you process each instrument optimally, then remix.
| Tool | Stems | Quality | Speed | Notes |
|---|---|---|---|---|
| Demucs htdemucs_ft | Vocals, Drums, Bass, Other | ⭐⭐⭐⭐⭐ | ~2–5 min/song GPU | Meta AI. State of the art. Free, open source. pip install demucs. Best overall. |
| Demucs htdemucs_6s | + Guitar, Piano (6 stems) | ⭐⭐⭐⭐ | ~3–6 min/song GPU | 6-stem variant. More useful for complex arrangements. Slightly lower per-stem quality. |
| Spleeter (Deezer) | 2, 4, or 5 stems | ⭐⭐⭐ | Very fast | Older, lower quality than Demucs. Not recommended for quality-focused work. |
On your question about bass: Once Demucs separates the bass stem, you can process it independently — apply HiFi-GAN or AudioSR to the isolated bass track (much better results than on a full mix), optionally apply a bass enhancer, then mix back. The bass in an MP3 is actually less damaged than the highs, but isolation + enhancement still yields audible improvement.
Once the drum stem is isolated, transient detection + quantization can tighten a slightly loose drummer. Tools: madmom for beat detection, librosa for time-stretching with phase vocoder.
Apply AudioSR or HiFi-GAN BWE to the isolated bass stem. Dramatically more effective than full-mix processing because the AI isn't confused by other instruments.
Phase 2Apply DeepFilterNet or RNNoise to the isolated vocal stem (where speech-trained models actually shine). Then AudioSR for high-frequency detail. Optional: subtle pitch correction.
Phase 2Guitars, keyboards, synths — process with AudioSR for high-frequency extension. This stem benefits most from diffusion-based reconstruction of harmonic overtones.
Phase 2After processing each stem independently, recombine them using Python's soundfile + numpy. Apply individual volume balancing. The remixed result is often noticeably better than the original mix.
Final-stage loudness normalization, EQ, stereo width enhancement. Python tools: pyloudnorm for LUFS normalization. Or integrate with LANDR API or iZotope Ozone (if available).
Normalize final output to streaming standard (−14 LUFS integrated). Ensures consistent playback volume across all restored tracks. Free, simple, measurable.
Phase 3The complete vision — all phases combined — processes each MP3 through a multi-stage chain. Not all stages are required for every song; the tool will offer granular control over which stages to enable.
The variants model: Rather than picking one "best" chain, the tool produces multiple variants at key decision points so you can compare and choose the best result for each individual song. Music is subjective — what sounds best for a jazz recording differs from what sounds best for heavy metal.
# Song: "Aircraft Carrier.mp3" AI_Restored_Variants/ 01_baseline/ ← Clean decode, no AI (reference) Aircraft-Carrier.flac 02_audiosr_direct/ ← AudioSR on full mix (Phase 1 ✓) Aircraft-Carrier.up.flac 03_audiosr_lp/ ← Lowpass → AudioSR (Phase 1 ✓) Aircraft-Carrier.lp_up.flac 04_stems/ ← Demucs 4-stem separation (Phase 2) vocals.flac drums.flac bass.flac other.flac 05_stems_enhanced/ ← Per-stem AI processing (Phase 2) vocals.enhanced.flac drums.enhanced.flac bass.enhanced.flac other.enhanced.flac 06_remixed/ ← Stems recombined (Phase 2) Aircraft-Carrier.remix.flac 07_mastered/ ← LUFS normalized, final (Phase 3) Aircraft-Carrier.master.flac _done/ ← Resume markers (JSON fingerprints) _logs/ ← Per-file processing logs
The goal is a single Python script — mp3restore.py — with a colorful interactive
batch menu. No Windows GUI. All control via keyboard. Works from CMD or PowerShell.
The menu drives everything, with full CLI pass-through for automation.
Menu, CLI argument parsing, hardware detection, file scanning, progress tracking, resume logic. All color output. Imports the modules below.
CoreWraps AudioSR CLI with chunking, crossfade stitching, GPU→CPU fallback. Carries forward all logic from current mp3wav_cla.py.
In-process HiFi-GAN BWE. Fast alternative to AudioSR. Model loaded once, all chunks processed without subprocess overhead.
Phase 1 done (gem)Wraps Demucs htdemucs_ft. Splits each song to 4 stems, saves as individual 24-bit FLACs. The critical enabler for all Phase 2 work.
Phase 2 — to buildPer-stem AI enhancement. Routes each stem to the optimal model: vocals → DeepFilter+AudioSR, bass/other → HiFi-GAN, drums → transient sharpening.
Phase 2 — to buildRecombines stems, applies LUFS normalization (pyloudnorm), stereo width enhancement, final FLAC-12 encode. Target: −14 LUFS for streaming.
.done.json fingerprint. Crash → restart → continues where it left off automatically..tmp first. Renamed to final only on verified success. No half-written files.| # | Task | Phase | Effort | Status |
|---|---|---|---|---|
| 01 | Fix batch file ANSI escape code input bug | 1 | Trivial (5 min) | Quick fix |
| 02 | Fix AudioSR model-reload-per-chunk (in-process wrapper) | 1 | Medium (1–2h) | High value |
| 03 | Finish current 10-song AudioSR batch | 1 | ~3–4 days GPU | Running |
| 04 | Install & test Demucs htdemucs_ft | 2 | 1–2h setup | Next up |
| 05 | Build demucs_runner.py module | 2 | Half day | Planned |
| 06 | Build stem_processor.py — per-stem routing logic | 2 | 1 day | Planned |
| 07 | Build mixer_master.py — recombine + LUFS normalize | 3 | Half day | Future |
| 08 | Unified mp3restore.py with full interactive menu | All | 1–2 days | Future |
| 09 | Drum timing correction module (madmom + librosa) | 2+ | 1 day R&D | Research |
| 10 | Batch compare/report — side-by-side variant quality metrics | 3 | Half day | Future |
Fix the model-reload bug first. The single biggest quality-of-life
improvement available right now — before Phase 2 even starts — is rewriting AudioSR to
run in-process like HiFi-GAN does in mp3wav_gem.py. This would cut the
per-song time roughly in half by eliminating ~10–15 seconds of Python startup + model
loading per chunk. On a 10-chunk song that's 100–150 seconds of pure overhead.
For the full 10-song batch it could save 20–30 minutes of total runtime at no quality cost.
This is important context for managing expectations. AI audio tools are impressive, but they operate under fundamental constraints.
Reconstruction is not recovery. When AudioSR "restores" high frequencies, it is generating statistically plausible audio based on patterns learned from training data. The output is not the original recording — it is a new signal that sounds like what the original probably sounded like. On average this is excellent. For specific moments (a particular cymbal hit, a guitar harmonic) it may be wrong in ways that are subtly audible to trained ears.
If a specific note's overtones were discarded by MP3, no AI can know what those specific overtones were. It will reconstruct something plausible, not the original truth.
If the recording itself was poorly made — bad mic placement, room acoustics issues, a musician who was genuinely out of sync — AI restoration cannot fix the underlying recording problem.
Diffusion models can occasionally generate high-frequency content that wasn't in the original at all — musical artifacts that are plausible but wrong. A/B comparison with the baseline is always necessary.
Demucs is remarkable but not perfect. Some bass frequencies appear in the "other" stem, some drum reverb bleeds into vocals. Per-stem processing must account for this bleed.
This model was trained on speech. It may identify musical harmonics and overtones as "noise" and reduce them. Only safe to use on isolated vocal stems, not full mixes. Test carefully.
The output will sound significantly better than the original MP3 to most listeners on most material. It will not be identical to the lost original. It is the best reconstruction currently possible.
The MP3 Resurrection Project is a serious, well-motivated application of current AI audio technology.
The foundation — mp3wav_cla.py with AudioSR — is already producing high-quality results.
Phase 2 (Demucs stem separation + per-stem enhancement) will unlock a qualitatively different
level of restoration by allowing targeted AI processing of each instrument independently.
Phase 3 (remix + mastering) will bring the final output to modern streaming quality.
The most important discipline throughout is critical listening with A/B comparison. Every variant produced is a hypothesis. The human ear is the final judge.
To continue in a new session: Open this HTML file, share it with Claude, and say: "We made this so far — look at the paper and let's continue." All context, decisions, tool choices, and the build plan are documented here.