Bhavana AI

AI/ML insights

Dev Log: February 4, 2026

tools

Ran a smoke test validating the full async podcast pipeline end-to-end, from job enqueue through worker pickup, generation attempt, and failure handling with atomic job state transitions.

The smoke test validated the entire async pipeline: enqueue returned instantly, the worker polled and picked it up within 30s, the generate() function ran, and on failure the job was atomically moved to failed/ with the error captured. The Signal failure notification would have also fired (the notify_failure call happens in the worker’s exception handler). The only thing untested is a successful generation + publish + success notification, which requires a URL that NotebookLM can actually process.


openclaw

Continued smoke testing the async podcast pipeline, confirming the same end-to-end behavior on the OpenClawd side of the plugin.

The smoke test validated the entire async pipeline: enqueue returned instantly, the worker polled and picked it up within 30s, the generate() function ran, and on failure the job was atomically moved to failed/ with the error captured. The Signal failure notification would have also fired (the notify_failure call happens in the worker’s exception handler). The only thing untested is a successful generation + publish + success notification, which requires a URL that NotebookLM can actually process.


courses

Heavy day on the Gavel scheduler simulator. Added structured logging for the visualizer’s preprocessor, fixed telemetry emission in the num_total_jobs simulation mode, ran FGD vs Random placement comparisons, and built out the Phase E steady-state experiment infrastructure with fractional GPU support and heterogeneous cluster configs.

  • The scheduler uses self._priorities[worker_type][job_id] to track per-job, per-worker-type priorities that drive the greedy scheduling order. This is the value the visualizer’s log parser expects in the Priority: field.
  • The _logger is a SchedulerAdapter wrapping a standard Python logger with a StreamHandler. To save logs to files, we need to add a FileHandler to the underlying _orig_logger.
  • The preprocessor’s parse_allocation() expects the exact format [Micro-task scheduled]\tJob ID: X\tWorker type: Y\tWorker ID(s): Z\tPriority: P — the tab-separated fields are parsed by regex.
  • Using .get() with a fallback of 1.0 makes this robust — if _priorities hasn’t been populated for a worker type/job combo (e.g., during FIFO scheduling where priorities aren’t computed), we won’t crash.
  • The log line goes before the _remove_available_worker_id calls, so worker_ids is still the full list at this point.
  • The parse_allocation regex in log_parser.py uses r'Job ID:\s*(\d+)' — this matches because our %s format of job_id will produce the integer job ID.
  • However, job_id in the scheduler is a JobIdPair object, not a plain int. Let me verify how it stringifies.
  • The integration test confirms no regression — the new logging is purely additive and doesn’t affect scheduling behavior or deterministic JCT outputs.
  • Notice that there are two [Micro-task scheduled] log lines in the output: ones from _print_schedule_summary() (existing, with Deficit/Allocation fields) and our new ones (simpler, just Job ID/Worker type/Worker IDs/Priority). The preprocessor’s parse_allocation() regex matches both since it only looks for [Micro-task scheduled] + Job ID: + Worker type: + Worker ID(s):.
  • The 4x duplication in the logs is because the test output shows 4 handler repetitions (the logger has multiple handlers attached in the test harness) — this won’t happen in normal usage with the file handler.
  • The .viz.bin files are only ~4.7K each. This is expected for a simulation with no TELEMETRY events — the FIFO scheduler in this config only logs TELEMETRY at specific intervals. The preprocessor only generates round data when it encounters TELEMETRY lines, so few rounds = small file.
  • Let me verify that TELEMETRY events were actually emitted in the logs.
  • The if jobs_to_complete is not None: block at line 1392 contains ALL telemetry emission AND the num_total_jobs exit path is in the elif at line 1522. When num_total_jobs is set (our case), we skip telemetry entirely.
  • The cleanest fix: emit telemetry unconditionally (it’s just logging), but keep the saturation/early-exit logic gated since it references jobs_to_complete. I’ll also need to add the num_total_jobs exit check after telemetry.
  • The key thing to look for in the comparison: FGD placement should show jobs packed more tightly on fewer servers (consolidated/compact GPU allocations), while Random placement scatters jobs across many servers, leaving fragmented gaps.
  • The Phase D results already quantify this: FGD has lower average fragmentation (1.88) vs Random (2.00), and slightly lower JCT (185,586 vs 186,353).
  • The refactor to emit TELEMETRY unconditionally was necessary because the num_total_jobs code path (used by run_fgd_experiments.py) was in an elif branch that skipped telemetry. Now telemetry is emitted for all simulation modes, and the .viz.bin files contain 6618 rounds of data each.
  • lam in the scheduler is the inter-arrival time parameter (NOT jobs/hr). At line 1689, 1.0/lam is passed as the rate to the exponential sampler.
  • _sample_arrival_time_delta(rate) returns -log(U)/rate, which is an exponential with mean 1/rate = lam.
  • So lam=1.0 means average inter-arrival time of 1 second (3600 jobs/hr). lam=3600 means 1 job/hr.
  • The replication configs use lambda: 3600 for 1 job/hr. The Phase E config uses lam: 1.0 which is 3600 jobs/hr — that’s extremely high load, explaining why Phase D ran into deadlocks with 100 jobs.

The key difference between the existing “fixed_jobs” mode and the new “steady_state” mode is how the simulation endpoint is determined. In fixed_jobs mode, the simulator runs until num_total_jobs finish. In steady-state mode, the simulator generates jobs continuously but only measures JCT for a window of jobs (4000-5000), letting the system reach equilibrium first. This avoids cold-start artifacts and matches the methodology from the Gavel paper’s own replication scripts.

The max_jct parameter acts as a safety valve: if any job in the measurement window exceeds this threshold (100 hours), the system is declared “saturated” and JCT is reported as infinity, rather than waiting forever.

The 5 configs differ in two dimensions: (1) whether the policy is heterogeneity-aware (_perf suffix), and (2) how GPUs are placed onto servers when FGD is enabled. The baseline_strided uses the non-_perf policy with default strided placement, gavel_strided uses the _perf policy but still strided, and the FGD variants (random, bestfit, fgd) use _perf plus fragmentation-aware scheduling with different placement heuristics. This lets us isolate the contribution of each component.

The scheduler’s simulate() expects num_gpus_per_server as a dict like {"v100": 4, "p100": 4, "k80": 4} because in principle different GPU types could have different server configurations. The old phase_e.json already had it in dict form. Adding the int-to-dict expansion in the runner is a convenience that keeps the config files cleaner and prevents this gotcha.

The pair throughput logic has three paths: (1) estimated throughputs using reference models, (2) no oracle data, and (3) oracle lookup of measured colocated throughputs. For fractional GPU pairs, we need to intercept before all three paths — if either job has gpu_request < 1.0, we use our own logic (scaled solo throughputs, with a packing constraint) instead of the oracle colocated throughputs, since the oracle data doesn’t cover fractional sharing scenarios.

  • lam=36.0 means 36 seconds between arrivals on average = 100 jobs/hr, which is roughly 1.7 jph per 108-GPU equivalent (low load on a 6,200-GPU cluster)
  • The workload_mode vs fgd_workload_mode distinction is intentional: workload_mode controls job generation (scale factor distribution), while fgd_workload_mode controls the FGD fragmentation calculator’s workload model (what job types it expects when computing fragmentation)
  • The original Gavel code hardcoded worker_types = ["v100", "p100", "k80"] because the paper only used those 3 GPU types. This fails silently when using different GPU types — the scheduler just schedules 0 jobs.
  • The fix derives worker types from the registered workers, making the scheduler generic for any GPU type set.
  • The _worker_type_to_worker_id_mapping is populated by _register_worker_callback() during simulation setup, so it always reflects the actual cluster topology.
  • The FGD placement adapter operates at the worker assignment level, not the GPU sharing level. Each job needs scale_factor physical worker IDs, regardless of its fractional GPU request.
  • Fractional GPU sharing (0.25/0.5 GPU jobs) is handled by Gavel’s JobIdPair mechanism: two fractional jobs are paired and share one worker ID. This pairing happens in the LP allocation, not during placement.
  • FGD’s fragmentation metric correctly models fractional tasks (the workload distribution includes 0.25/0.5 GPU types), but the placement request must match physical GPU slot count.
  • Arrival rate scaling: Gavel’s original 1.0 jph on 108 GPUs scales to ~57 jph on 6,200 GPUs. I chose 60, 180, and 360 jph (lam=60, 20, 10 seconds between arrivals) to cover low, medium, and high load regimes.
  • Lambda convention: lam in Gavel’s Poisson process is the inter-arrival time in seconds, not the rate. So lam=60 means 60 seconds between arrivals = 60 jobs/hr. lam=10 = 360 jobs/hr.
  • Total: 45 experiments (5 configs x 3 rates x 3 seeds), matching the design doc’s per-figure count.

personal-finance

Built out the pf personal finance CLI tool, implementing a flow-based budgeting model with 2026 tax calculations. Added income tracking from YNAB exports, 401(k) mega backdoor Roth calculations, emergency fund tracking, and spending detail views with category breakdowns.

  • The original plan had 5 layers (Claude Code Skill -> Python scripts -> Azure SQL/KeyVault/Functions -> GitHub Pages webapp). What actually got built is 2 layers: Claude Code conversation -> pf CLI commands + local SQLite. The simplification happened naturally as each layer proved unnecessary.
  • The budgeting philosophy shift from per-category allocation to flow-based waterfall is the biggest conceptual change. It mirrors how most people actually think about money: “how much comes in, how much goes to taxes/savings, how much is left?”
  • The provenance column in budget flow is an underrated feature — it makes the system self-documenting. Every number traces back to a specific pf command and data source.
  • The OBBBA (One Big Beautiful Bill Act) made the 2017 TCJA tax rates permanent and added extra inflation adjustments for the 10% and 12% brackets (4% vs 2.3% for higher brackets). This means the lower brackets expanded more than usual for 2026.
  • The DCFSA limit jumped from $5,000 to $7,500 under the OBBBA — a significant increase that adds $2,500 more pre-tax savings.
  • MFS brackets are typically the same as single except the 35% bracket, which is half of MFJ’s threshold ($384,350 vs $768,700).
  • The mega backdoor Roth uses the gap between employee contributions and the total annual additions limit ($72,000 for 2026). The employer match counts against this total, so it reduces the after-tax space.
  • Microsoft’s 50% match on $24,500 = $12,250. So: $72,000 - $24,500 (Roth) - $12,250 (match) = $35,250 available for after-tax.
  • This is $500 more than 2025’s $34,750, because the total limit grew by $2,000 while the employee limit grew by $1,000 (leaving $1,000 more for match + after-tax, of which $500 goes to the larger match).
  • The savings “Total Savings” row now pulls double duty: it’s both a sum and a comparison against the emergency fund target. The plan column shows $50,000 (the target), and delta shows how far off you are (-$5,439). This makes the gap immediately visible in each check-in.
  • Checking is not rolled into a “Total Cash” with savings anymore, because they serve different purposes: checking is for daily transactions, savings is the emergency fund being tracked against a goal.
  • YNAB uses “Inflow: Ready to Assign” for both real income and housekeeping entries (starting balances, reconciliation adjustments). The filter now excludes payees matching “starting balance”, “manual balance adjustment”, and “reconciliation balance adjustment” (case-insensitive). This is a pattern to watch — if YNAB adds new housekeeping payee names, they’d need to be added to the exclusion list.