Dev Log: February 4, 2026

tools

Ran a smoke test validating the full async podcast pipeline end-to-end, from job enqueue through worker pickup, generation attempt, and failure handling with atomic job state transitions.

The smoke test validated the entire async pipeline: enqueue returned instantly, the worker polled and picked it up within 30s, the generate() function ran, and on failure the job was atomically moved to failed/ with the error captured. The Signal failure notification would have also fired (the notify_failure call happens in the worker’s exception handler). The only thing untested is a successful generation + publish + success notification, which requires a URL that NotebookLM can actually process.

openclaw

Continued smoke testing the async podcast pipeline, confirming the same end-to-end behavior on the OpenClawd side of the plugin.

courses

Heavy day on the Gavel scheduler simulator. Added structured logging for the visualizer’s preprocessor, fixed telemetry emission in the num_total_jobs simulation mode, ran FGD vs Random placement comparisons, and built out the Phase E steady-state experiment infrastructure with fractional GPU support and heterogeneous cluster configs.

The scheduler uses self._priorities[worker_type][job_id] to track per-job, per-worker-type priorities that drive the greedy scheduling order. This is the value the visualizer’s log parser expects in the Priority: field.
The _logger is a SchedulerAdapter wrapping a standard Python logger with a StreamHandler. To save logs to files, we need to add a FileHandler to the underlying _orig_logger.
The preprocessor’s parse_allocation() expects the exact format [Micro-task scheduled]\tJob ID: X\tWorker type: Y\tWorker ID(s): Z\tPriority: P — the tab-separated fields are parsed by regex.

Using .get() with a fallback of 1.0 makes this robust — if _priorities hasn’t been populated for a worker type/job combo (e.g., during FIFO scheduling where priorities aren’t computed), we won’t crash.
The log line goes before the _remove_available_worker_id calls, so worker_ids is still the full list at this point.

The parse_allocation regex in log_parser.py uses r'Job ID:\s*(\d+)' — this matches because our %s format of job_id will produce the integer job ID.
However, job_id in the scheduler is a JobIdPair object, not a plain int. Let me verify how it stringifies.

The integration test confirms no regression — the new logging is purely additive and doesn’t affect scheduling behavior or deterministic JCT outputs.
Notice that there are two [Micro-task scheduled] log lines in the output: ones from _print_schedule_summary() (existing, with Deficit/Allocation fields) and our new ones (simpler, just Job ID/Worker type/Worker IDs/Priority). The preprocessor’s parse_allocation() regex matches both since it only looks for [Micro-task scheduled] + Job ID: + Worker type: + Worker ID(s):.
The 4x duplication in the logs is because the test output shows 4 handler repetitions (the logger has multiple handlers attached in the test harness) — this won’t happen in normal usage with the file handler.

The .viz.bin files are only ~4.7K each. This is expected for a simulation with no TELEMETRY events — the FIFO scheduler in this config only logs TELEMETRY at specific intervals. The preprocessor only generates round data when it encounters TELEMETRY lines, so few rounds = small file.
Let me verify that TELEMETRY events were actually emitted in the logs.

The if jobs_to_complete is not None: block at line 1392 contains ALL telemetry emission AND the num_total_jobs exit path is in the elif at line 1522. When num_total_jobs is set (our case), we skip telemetry entirely.
The cleanest fix: emit telemetry unconditionally (it’s just logging), but keep the saturation/early-exit logic gated since it references jobs_to_complete. I’ll also need to add the num_total_jobs exit check after telemetry.

The key thing to look for in the comparison: FGD placement should show jobs packed more tightly on fewer servers (consolidated/compact GPU allocations), while Random placement scatters jobs across many servers, leaving fragmented gaps.
The Phase D results already quantify this: FGD has lower average fragmentation (1.88) vs Random (2.00), and slightly lower JCT (185,586 vs 186,353).
The refactor to emit TELEMETRY unconditionally was necessary because the num_total_jobs code path (used by run_fgd_experiments.py) was in an elif branch that skipped telemetry. Now telemetry is emitted for all simulation modes, and the .viz.bin files contain 6618 rounds of data each.

lam in the scheduler is the inter-arrival time parameter (NOT jobs/hr). At line 1689, 1.0/lam is passed as the rate to the exponential sampler.
_sample_arrival_time_delta(rate) returns -log(U)/rate, which is an exponential with mean 1/rate = lam.
So lam=1.0 means average inter-arrival time of 1 second (3600 jobs/hr). lam=3600 means 1 job/hr.
The replication configs use lambda: 3600 for 1 job/hr. The Phase E config uses lam: 1.0 which is 3600 jobs/hr — that’s extremely high load, explaining why Phase D ran into deadlocks with 100 jobs.

The key difference between the existing “fixed_jobs” mode and the new “steady_state” mode is how the simulation endpoint is determined. In fixed_jobs mode, the simulator runs until num_total_jobs finish. In steady-state mode, the simulator generates jobs continuously but only measures JCT for a window of jobs (4000-5000), letting the system reach equilibrium first. This avoids cold-start artifacts and matches the methodology from the Gavel paper’s own replication scripts.

The max_jct parameter acts as a safety valve: if any job in the measurement window exceeds this threshold (100 hours), the system is declared “saturated” and JCT is reported as infinity, rather than waiting forever.

The 5 configs differ in two dimensions: (1) whether the policy is heterogeneity-aware (_perf suffix), and (2) how GPUs are placed onto servers when FGD is enabled. The baseline_strided uses the non-_perf policy with default strided placement, gavel_strided uses the _perf policy but still strided, and the FGD variants (random, bestfit, fgd) use _perf plus fragmentation-aware scheduling with different placement heuristics. This lets us isolate the contribution of each component.

The scheduler’s simulate() expects num_gpus_per_server as a dict like {"v100": 4, "p100": 4, "k80": 4} because in principle different GPU types could have different server configurations. The old phase_e.json already had it in dict form. Adding the int-to-dict expansion in the runner is a convenience that keeps the config files cleaner and prevents this gotcha.

The pair throughput logic has three paths: (1) estimated throughputs using reference models, (2) no oracle data, and (3) oracle lookup of measured colocated throughputs. For fractional GPU pairs, we need to intercept before all three paths — if either job has gpu_request < 1.0, we use our own logic (scaled solo throughputs, with a packing constraint) instead of the oracle colocated throughputs, since the oracle data doesn’t cover fractional sharing scenarios.

lam=36.0 means 36 seconds between arrivals on average = 100 jobs/hr, which is roughly 1.7 jph per 108-GPU equivalent (low load on a 6,200-GPU cluster)
The workload_mode vs fgd_workload_mode distinction is intentional: workload_mode controls job generation (scale factor distribution), while fgd_workload_mode controls the FGD fragmentation calculator’s workload model (what job types it expects when computing fragmentation)

The original Gavel code hardcoded worker_types = ["v100", "p100", "k80"] because the paper only used those 3 GPU types. This fails silently when using different GPU types — the scheduler just schedules 0 jobs.
The fix derives worker types from the registered workers, making the scheduler generic for any GPU type set.
The _worker_type_to_worker_id_mapping is populated by _register_worker_callback() during simulation setup, so it always reflects the actual cluster topology.

The FGD placement adapter operates at the worker assignment level, not the GPU sharing level. Each job needs scale_factor physical worker IDs, regardless of its fractional GPU request.
Fractional GPU sharing (0.25/0.5 GPU jobs) is handled by Gavel’s JobIdPair mechanism: two fractional jobs are paired and share one worker ID. This pairing happens in the LP allocation, not during placement.
FGD’s fragmentation metric correctly models fractional tasks (the workload distribution includes 0.25/0.5 GPU types), but the placement request must match physical GPU slot count.

Arrival rate scaling: Gavel’s original 1.0 jph on 108 GPUs scales to ~57 jph on 6,200 GPUs. I chose 60, 180, and 360 jph (lam=60, 20, 10 seconds between arrivals) to cover low, medium, and high load regimes.
Lambda convention: lam in Gavel’s Poisson process is the inter-arrival time in seconds, not the rate. So lam=60 means 60 seconds between arrivals = 60 jobs/hr. lam=10 = 360 jobs/hr.
Total: 45 experiments (5 configs x 3 rates x 3 seeds), matching the design doc’s per-figure count.

personal-finance

Built out the pf personal finance CLI tool, implementing a flow-based budgeting model with 2026 tax calculations. Added income tracking from YNAB exports, 401(k) mega backdoor Roth calculations, emergency fund tracking, and spending detail views with category breakdowns.

The original plan had 5 layers (Claude Code Skill -> Python scripts -> Azure SQL/KeyVault/Functions -> GitHub Pages webapp). What actually got built is 2 layers: Claude Code conversation -> pf CLI commands + local SQLite. The simplification happened naturally as each layer proved unnecessary.
The budgeting philosophy shift from per-category allocation to flow-based waterfall is the biggest conceptual change. It mirrors how most people actually think about money: “how much comes in, how much goes to taxes/savings, how much is left?”
The provenance column in budget flow is an underrated feature — it makes the system self-documenting. Every number traces back to a specific pf command and data source.

The OBBBA (One Big Beautiful Bill Act) made the 2017 TCJA tax rates permanent and added extra inflation adjustments for the 10% and 12% brackets (4% vs 2.3% for higher brackets). This means the lower brackets expanded more than usual for 2026.
The DCFSA limit jumped from $5,000 to $7,500 under the OBBBA — a significant increase that adds $2,500 more pre-tax savings.
MFS brackets are typically the same as single except the 35% bracket, which is half of MFJ’s threshold ($384,350 vs $768,700).

The mega backdoor Roth uses the gap between employee contributions and the total annual additions limit ($72,000 for 2026). The employer match counts against this total, so it reduces the after-tax space.
Microsoft’s 50% match on $24,500 = $12,250. So: $72,000 - $24,500 (Roth) - $12,250 (match) = $35,250 available for after-tax.
This is $500 more than 2025’s $34,750, because the total limit grew by $2,000 while the employee limit grew by $1,000 (leaving $1,000 more for match + after-tax, of which $500 goes to the larger match).

The savings “Total Savings” row now pulls double duty: it’s both a sum and a comparison against the emergency fund target. The plan column shows $50,000 (the target), and delta shows how far off you are (-$5,439). This makes the gap immediately visible in each check-in.
Checking is not rolled into a “Total Cash” with savings anymore, because they serve different purposes: checking is for daily transactions, savings is the emergency fund being tracked against a goal.

YNAB uses “Inflow: Ready to Assign” for both real income and housekeeping entries (starting balances, reconciliation adjustments). The filter now excludes payees matching “starting balance”, “manual balance adjustment”, and “reconciliation balance adjustment” (case-insensitive). This is a pattern to watch — if YNAB adds new housekeeping payee names, they’d need to be added to the exclusion list.