Dev Log: January 6, 2026
podcast-summarizer-v2
Built out the distributed job processing infrastructure for the podcast summarizer. Started by discovering that the existing BatchState table and Delivery model already had most of the schema foundations needed, then initialized alembic and created migrations for lease columns, expired status, and batch state seeding. Implemented a LeaseManager with atomic acquire/heartbeat/release operations, a CAS-based orchestrator wakeup mechanism with throttling, a sweeper for stuck deliveries, and the orchestrator router itself. Later adapted the codebase to match the actual user tier values ("premium" vs "paid") and extended the existing multi-stage Dockerfile rather than creating separate files.
Key architectural context discovered:
BatchStatetable already exists withkey,value_int,value_textcolumns - perfect for lease storage without schema changesDeliverymodel hasvalidation_job_idfor priority signaling - the foundation for validation priority is already in place- No alembic migrations directory exists - will need to initialize alembic first
Phase 1 Complete! Schema foundation laid:
- 3 migrations for lease columns, expired status, and batch_state seeding
- Delivery model now has
processing_owner_id,processing_started_at,processing_lease_expires_at,processing_attempts batch_staterows seeded for orchestrator and transcriber leases- Total: 3/33 tasks done (9%)
Clock-skew tradeoff noted:
- Plan specified
SYSUTCDATETIME()for production DB time - Implementation uses Python
time.time()for SQLite compatibility - Acceptable because: 30-min TTL >> typical Azure clock drift (~ms)
- Could be enhanced later with dialect detection if needed
Phase 2 Complete! LeaseManager API:
acquire(key, owner_id)- Atomic acquisition with single-holder guard (M1, M4)heartbeat(key, owner_id)- Extend lease during long operationsrelease(key, owner_id)- Explicit lease release- All support concurrent access with atomic UPDATE (M27)
Phase 3 Complete! Wakeup CAS API:
trigger_orchestrator_if_idle(db, request_id)- Atomic CAS trigger- 60-second throttle window prevents rapid re-triggers
- Rollback on failure enables retry (M5, M6, M7, M28)
Phase 4 Complete! Sweeper API:
sweep_stuck_deliveries(db)- Requeues expired or fails max-attempts- Hard timeout (60min) catches misconfigured heartbeats
- Sweeper runs at orchestrator start for self-healing (M19, M20)
Phase 5 Complete! Orchestrator API:
run_orchestrator(db)- Router-only: doesn’t claim, just triggers workerscount_gpu_work()/count_cpu_work()- Check for work needing GPU/CPUexpire_stale_deliveries()- TTL cleanup for 7-day-old deliveries- Tests: M8, M9, M10, M23 all passing
User model has tier field - but uses "free" and "premium" values (not "paid" as in plan). The implementer needs to adapt the tests to use "premium" instead of "paid".
Existing multi-stage Dockerfile - The project already has a sophisticated Dockerfile with api, jobs, and gpu-base stages. Plan suggests separate files, but extending the existing pattern is cleaner.