Dev Log: January 4, 2026

podcast-summarizer-v2

A deep day of memory debugging and architecture work on the podcast summarizer. The session started with frontend persistence patterns for job validation state, then pivoted hard into diagnosing and fixing OOM crashes in the transcription pipeline. The root cause turned out to be faster-whisper decoding entire audio files into PCM buffers (up to 920MB for long episodes), which, combined with model weights and inference buffers, exceeded the 8GB container limit. The fix was audio chunking: splitting episodes into 20-minute segments before they reach faster-whisper, bounding GPU memory to roughly 77MB per chunk. A Codex consultation caught a critical bug where the fallback path would have silently reintroduced the OOM risk. After stabilizing transcription, the focus shifted to YouTube channel integration: episode discovery, filtering private videos at the provider level, and building an admin workflow for pending episodes. The day wrapped with UI polish on the admin panel, including editable URLs with audit logging.

How localStorage persistence works here:

Lazy initialization - useState(() => loadValidationState()?.channelId) reads from localStorage only once on mount, avoiding unnecessary reads on re-renders
Atomic updates - setValidationState() updates both React state and localStorage together, preventing inconsistencies
Self-healing - If you return to a stale job (already completed), the polling hook fetches the terminal status, triggers the completion effect, and clears the stale localStorage entry

The audio files are 150-272MB in size. The download_audio_by_path method loads the entire file into RAM before writing to disk. Combined with the Whisper model (~2GB), this exceeds the 8GB container limit during transcription.

Model weights are NOT the main problem - small model is only ~0.45GB on disk
Inference buffers are the killer - CTranslate2 allocates ~1.5-2GB of working memory during decode
Audio-in-memory compounds the issue - the Dec 29 change added 150-272MB on top of peak
We’re operating at the edge - 8GB with ~2GB headroom, any spike = OOM

Key learning from Codex consultation: The real memory killer wasn’t the model (0.45GB) or even the audio file (272MB). It was faster-whisper decoding the entire audio into PCM buffers (~920MB for 4 hours). Chunking the audio before it reaches faster-whisper bounds this to ~77MB per chunk.

The Codex review caught a critical issue: my original fallback to direct transcription when chunking failed would have reintroduced the OOM risk for the exact files that need chunking most. Always fail hard when the mitigation fails - don’t silently bypass it.

The audio chunking feature (from commit e36b53d) is working correctly. Previously, episodes like the 72.6-min Trump interview would’ve caused OOM errors. Now they’re split into 20-min chunks and processed sequentially, keeping GPU memory usage bounded.

The processor uses a batch-then-deliver pattern rather than per-episode delivery. This is a deliberate design choice that:

Prioritizes throughput (don’t context-switch between GPU work and network I/O)
Keeps the expensive GPU loaded for transcription as long as possible
Sends all emails at the end when compute is done

The downside: if the job crashes mid-batch, no emails are sent even for completed episodes.

extract_flat: True trades metadata completeness for speed. It’s great for getting video IDs quickly, but loses date/duration info.
Getting actual upload dates requires per-video API calls (much slower).
For re-uploaded content like podcasts on your channel, even the upload_date would be wrong - it’s the date you uploaded to YouTube, not the original release date.

Using “discovery time” instead of “upload date” for YouTube episodes is actually better for your use case:

For new episodes: Discovery time ≈ upload time (poller runs every 12h)
For backfilled episodes: All get the same timestamp, but at least they’re consistent
For re-uploaded content: You avoid the problem of showing the re-upload date instead of original release

Filtering at the provider level (rather than discovery) ensures:

Private videos never enter the system, regardless of which code path discovers them
The filtering logic is in one place, close to where yt-dlp data is parsed
The debug log helps track how many videos are being filtered during discovery runs

The editable URL only sends to backend if actually changed (avoiding unnecessary data)
Backend logs when admin overrides URL for audit trail
Shows “Original: {url}” below input so admin can reference what user submitted

The workflow for YouTube channels is:

Discovery adds episodes
Episodes without captions/audio appear at /admin/youtube/pending
Run python scripts/download_youtube.py to batch download+upload
Batch processor processes them normally

The script handles everything: yt-dlp download → blob upload → mark episode ready.