Dev Log: February 4, 2026
tools
Ran a smoke test validating the full async podcast pipeline end-to-end, from job enqueue through worker pickup, generation attempt, and failure handling with atomic job state transitions.
The smoke test validated the entire async pipeline: enqueue returned instantly, the worker polled and picked it up within 30s, the generate() function ran, and on failure the job was atomically moved to failed/ with the error captured. The Signal failure notification would have also fired (the notify_failure call happens in the worker’s exception handler). The only thing untested is a successful generation + publish + success notification, which requires a URL that NotebookLM can actually process.
openclaw
Continued smoke testing the async podcast pipeline, confirming the same end-to-end behavior on the OpenClawd side of the plugin.
The smoke test validated the entire async pipeline: enqueue returned instantly, the worker polled and picked it up within 30s, the generate() function ran, and on failure the job was atomically moved to failed/ with the error captured. The Signal failure notification would have also fired (the notify_failure call happens in the worker’s exception handler). The only thing untested is a successful generation + publish + success notification, which requires a URL that NotebookLM can actually process.
courses
Heavy day on the Gavel scheduler simulator. Added structured logging for the visualizer’s preprocessor, fixed telemetry emission in the num_total_jobs simulation mode, ran FGD vs Random placement comparisons, and built out the Phase E steady-state experiment infrastructure with fractional GPU support and heterogeneous cluster configs.
- The scheduler uses
self._priorities[worker_type][job_id]to track per-job, per-worker-type priorities that drive the greedy scheduling order. This is the value the visualizer’s log parser expects in thePriority:field. - The
_loggeris aSchedulerAdapterwrapping a standard Python logger with aStreamHandler. To save logs to files, we need to add aFileHandlerto the underlying_orig_logger. - The preprocessor’s
parse_allocation()expects the exact format[Micro-task scheduled]\tJob ID: X\tWorker type: Y\tWorker ID(s): Z\tPriority: P— the tab-separated fields are parsed by regex.
- Using
.get()with a fallback of1.0makes this robust — if_prioritieshasn’t been populated for a worker type/job combo (e.g., during FIFO scheduling where priorities aren’t computed), we won’t crash. - The log line goes before the
_remove_available_worker_idcalls, soworker_idsis still the full list at this point.
- The
parse_allocationregex inlog_parser.pyusesr'Job ID:\s*(\d+)'— this matches because our%sformat ofjob_idwill produce the integer job ID. - However,
job_idin the scheduler is aJobIdPairobject, not a plain int. Let me verify how it stringifies.
- The integration test confirms no regression — the new logging is purely additive and doesn’t affect scheduling behavior or deterministic JCT outputs.
- Notice that there are two
[Micro-task scheduled]log lines in the output: ones from_print_schedule_summary()(existing, with Deficit/Allocation fields) and our new ones (simpler, just Job ID/Worker type/Worker IDs/Priority). The preprocessor’sparse_allocation()regex matches both since it only looks for[Micro-task scheduled]+Job ID:+Worker type:+Worker ID(s):. - The 4x duplication in the logs is because the test output shows 4 handler repetitions (the logger has multiple handlers attached in the test harness) — this won’t happen in normal usage with the file handler.
- The
.viz.binfiles are only ~4.7K each. This is expected for a simulation with no TELEMETRY events — the FIFO scheduler in this config only logs TELEMETRY at specific intervals. The preprocessor only generates round data when it encounters TELEMETRY lines, so few rounds = small file. - Let me verify that TELEMETRY events were actually emitted in the logs.
- The
if jobs_to_complete is not None:block at line 1392 contains ALL telemetry emission AND thenum_total_jobsexit path is in theelifat line 1522. Whennum_total_jobsis set (our case), we skip telemetry entirely. - The cleanest fix: emit telemetry unconditionally (it’s just logging), but keep the saturation/early-exit logic gated since it references
jobs_to_complete. I’ll also need to add thenum_total_jobsexit check after telemetry.
- The key thing to look for in the comparison: FGD placement should show jobs packed more tightly on fewer servers (consolidated/compact GPU allocations), while Random placement scatters jobs across many servers, leaving fragmented gaps.
- The Phase D results already quantify this: FGD has lower average fragmentation (1.88) vs Random (2.00), and slightly lower JCT (185,586 vs 186,353).
- The refactor to emit TELEMETRY unconditionally was necessary because the
num_total_jobscode path (used byrun_fgd_experiments.py) was in anelifbranch that skipped telemetry. Now telemetry is emitted for all simulation modes, and the.viz.binfiles contain 6618 rounds of data each.
lamin the scheduler is the inter-arrival time parameter (NOT jobs/hr). At line 1689,1.0/lamis passed as the rate to the exponential sampler._sample_arrival_time_delta(rate)returns-log(U)/rate, which is an exponential with mean1/rate = lam.- So
lam=1.0means average inter-arrival time of 1 second (3600 jobs/hr).lam=3600means 1 job/hr. - The replication configs use
lambda: 3600for 1 job/hr. The Phase E config useslam: 1.0which is 3600 jobs/hr — that’s extremely high load, explaining why Phase D ran into deadlocks with 100 jobs.
The key difference between the existing “fixed_jobs” mode and the new “steady_state” mode is how the simulation endpoint is determined. In fixed_jobs mode, the simulator runs until num_total_jobs finish. In steady-state mode, the simulator generates jobs continuously but only measures JCT for a window of jobs (4000-5000), letting the system reach equilibrium first. This avoids cold-start artifacts and matches the methodology from the Gavel paper’s own replication scripts.
The max_jct parameter acts as a safety valve: if any job in the measurement window exceeds this threshold (100 hours), the system is declared “saturated” and JCT is reported as infinity, rather than waiting forever.
The 5 configs differ in two dimensions: (1) whether the policy is heterogeneity-aware (_perf suffix), and (2) how GPUs are placed onto servers when FGD is enabled. The baseline_strided uses the non-_perf policy with default strided placement, gavel_strided uses the _perf policy but still strided, and the FGD variants (random, bestfit, fgd) use _perf plus fragmentation-aware scheduling with different placement heuristics. This lets us isolate the contribution of each component.
The scheduler’s simulate() expects num_gpus_per_server as a dict like {"v100": 4, "p100": 4, "k80": 4} because in principle different GPU types could have different server configurations. The old phase_e.json already had it in dict form. Adding the int-to-dict expansion in the runner is a convenience that keeps the config files cleaner and prevents this gotcha.
The pair throughput logic has three paths: (1) estimated throughputs using reference models, (2) no oracle data, and (3) oracle lookup of measured colocated throughputs. For fractional GPU pairs, we need to intercept before all three paths — if either job has gpu_request < 1.0, we use our own logic (scaled solo throughputs, with a packing constraint) instead of the oracle colocated throughputs, since the oracle data doesn’t cover fractional sharing scenarios.
lam=36.0means 36 seconds between arrivals on average = 100 jobs/hr, which is roughly 1.7 jph per 108-GPU equivalent (low load on a 6,200-GPU cluster)- The
workload_modevsfgd_workload_modedistinction is intentional:workload_modecontrols job generation (scale factor distribution), whilefgd_workload_modecontrols the FGD fragmentation calculator’s workload model (what job types it expects when computing fragmentation)
- The original Gavel code hardcoded
worker_types = ["v100", "p100", "k80"]because the paper only used those 3 GPU types. This fails silently when using different GPU types — the scheduler just schedules 0 jobs. - The fix derives worker types from the registered workers, making the scheduler generic for any GPU type set.
- The
_worker_type_to_worker_id_mappingis populated by_register_worker_callback()during simulation setup, so it always reflects the actual cluster topology.
- The FGD placement adapter operates at the worker assignment level, not the GPU sharing level. Each job needs
scale_factorphysical worker IDs, regardless of its fractional GPU request. - Fractional GPU sharing (0.25/0.5 GPU jobs) is handled by Gavel’s
JobIdPairmechanism: two fractional jobs are paired and share one worker ID. This pairing happens in the LP allocation, not during placement. - FGD’s fragmentation metric correctly models fractional tasks (the workload distribution includes 0.25/0.5 GPU types), but the placement request must match physical GPU slot count.
- Arrival rate scaling: Gavel’s original 1.0 jph on 108 GPUs scales to ~57 jph on 6,200 GPUs. I chose 60, 180, and 360 jph (lam=60, 20, 10 seconds between arrivals) to cover low, medium, and high load regimes.
- Lambda convention:
lamin Gavel’s Poisson process is the inter-arrival time in seconds, not the rate. Solam=60means 60 seconds between arrivals = 60 jobs/hr.lam=10= 360 jobs/hr. - Total: 45 experiments (5 configs x 3 rates x 3 seeds), matching the design doc’s per-figure count.
personal-finance
Built out the pf personal finance CLI tool, implementing a flow-based budgeting model with 2026 tax calculations. Added income tracking from YNAB exports, 401(k) mega backdoor Roth calculations, emergency fund tracking, and spending detail views with category breakdowns.
- The original plan had 5 layers (Claude Code Skill -> Python scripts -> Azure SQL/KeyVault/Functions -> GitHub Pages webapp). What actually got built is 2 layers: Claude Code conversation ->
pfCLI commands + local SQLite. The simplification happened naturally as each layer proved unnecessary. - The budgeting philosophy shift from per-category allocation to flow-based waterfall is the biggest conceptual change. It mirrors how most people actually think about money: “how much comes in, how much goes to taxes/savings, how much is left?”
- The provenance column in
budget flowis an underrated feature — it makes the system self-documenting. Every number traces back to a specificpfcommand and data source.
- The OBBBA (One Big Beautiful Bill Act) made the 2017 TCJA tax rates permanent and added extra inflation adjustments for the 10% and 12% brackets (4% vs 2.3% for higher brackets). This means the lower brackets expanded more than usual for 2026.
- The DCFSA limit jumped from $5,000 to $7,500 under the OBBBA — a significant increase that adds $2,500 more pre-tax savings.
- MFS brackets are typically the same as single except the 35% bracket, which is half of MFJ’s threshold ($384,350 vs $768,700).
- The mega backdoor Roth uses the gap between employee contributions and the total annual additions limit ($72,000 for 2026). The employer match counts against this total, so it reduces the after-tax space.
- Microsoft’s 50% match on $24,500 = $12,250. So: $72,000 - $24,500 (Roth) - $12,250 (match) = $35,250 available for after-tax.
- This is $500 more than 2025’s $34,750, because the total limit grew by $2,000 while the employee limit grew by $1,000 (leaving $1,000 more for match + after-tax, of which $500 goes to the larger match).
- The savings “Total Savings” row now pulls double duty: it’s both a sum and a comparison against the emergency fund target. The plan column shows $50,000 (the target), and delta shows how far off you are (-$5,439). This makes the gap immediately visible in each check-in.
- Checking is not rolled into a “Total Cash” with savings anymore, because they serve different purposes: checking is for daily transactions, savings is the emergency fund being tracked against a goal.
- YNAB uses “Inflow: Ready to Assign” for both real income and housekeeping entries (starting balances, reconciliation adjustments). The filter now excludes payees matching “starting balance”, “manual balance adjustment”, and “reconciliation balance adjustment” (case-insensitive). This is a pattern to watch — if YNAB adds new housekeeping payee names, they’d need to be added to the exclusion list.