Dev Log: February 7, 2026

personal-finance

Built a paystub upload endpoint that accepts PDF uploads directly via Bun’s native FormData support, then added pay-period-based projection logic that correctly annualizes biweekly income (using totalPeriods / currentPeriod instead of calendar fractions). Also implemented a full provenance system that traces every numeric value in the budget flow table back to its raw data sources, with badge indicators showing where each number came from. Fixed the tax sign convention (paystub taxes are negative deductions, budget expects positive amounts) and added dual projection factors so income uses pay-period-based extrapolation while spending uses calendar-fraction projection.

The existing extractPdfText already creates a Uint8Array from a file buffer. The new extractPdfTextFromBuffer will accept an ArrayBuffer directly, which is what Bun.serve() provides from request.arrayBuffer(). This avoids writing uploaded files to disk.

The paystub fallback follows the same pattern as W-2 lookups: Federal tax sums across all persons, while SS/Medicare/PFML are per-person. WA PFML is unique — it’s the sum of WA-FLI + WA-MLI from Kristin’s paystub, matching how the budget plan defines it as a single line item.

The projection factor is computed from the latest paystub pay_period_end date. If we have a January 31 paystub, monthsElapsed = 1.0, so projectionFactor = 12. This annualizes YTD figures by simple linear extrapolation. Summary/total lines recompute projected from their component lines (sum of parts) to avoid rounding drift from projecting the total directly.

The upload endpoint uses FormData from the request, which Bun’s Request natively supports via request.formData(). The upsert pattern (DELETE + INSERT in a transaction) ensures only one paystub per person per year exists, so uploading a newer paystub replaces the old one atomically.

The Projected column header includes the projectionMonths context (e.g., “Projected (1.0 mo)”) so users understand the basis of the extrapolation. The colSpan for section headers and expand rows increases from 6 to 7 to account for the new column.

Bun’s routes option in Bun.serve() maps paths to route handlers. When you use { GET: fn } or { POST: fn }, it creates method-specific handlers. The issue is likely that Bun.serve() routes require the route to be defined as a function for all methods, or that the route format for POST differs from GET in newer Bun versions.

The paystub parser represents taxes as negative numbers (deductions from earnings), while the budget flow table expects positive tax amounts. The fix needs to take Math.abs() of paystub tax values when using them as actuals, just like the code already does for benefits/retirement deductions.

The provenance system works as a “derivation chain” — each numeric value in the table gets a list of steps showing how it was computed from raw data. This is similar to how spreadsheet audit trails work: you can trace any cell back to its source inputs and the transformations applied. The key design choice is attaching provenance per-column (plan, actual, projected, delta, prev) since each column may derive from different sources.

The provenance system is built as a “shadow” data structure alongside the existing lines array. Each time a PartialLine is pushed, a corresponding Provenance object is stored in a parallel array. This approach avoids changing the PartialLine type (which would require adding provenance to every push call) while keeping the provenance logic co-located with the data it describes. During the post-processing pass, provenance and projected values are merged into the final FlowLine objects.

The badge system uses the first step of the actual provenance chain to determine the data source, creating a direct connection between the visual indicator and the underlying derivation. This is more reliable than the old source string parsing because it’s derived from the same structured data that powers the expansion panel.

The budget-flow.ts grew from ~930 to ~1316 lines. The provenance chains add ~40% more code alongside the existing logic, but all of it is additive — the existing data flow, PartialLine pushes, and post-processing remain unchanged. The provenance array (provs[]) is always kept in sync with lines[] by pushing to both arrays at every insertion point, and they’re joined during the post-processing map where projected/provenance are merged into FlowLine objects. This “parallel array” pattern keeps the existing code readable while adding the new capability.

For biweekly pay, calendar-fraction projection is inaccurate because paychecks don’t land evenly across months. A Jan 30 pay period end could mean 2 of 26 pay periods have been received. The YTD/period earnings ratio is a clean signal: it tells us exactly how many pay periods are in the YTD figures, and with biweekly pay we know there are 26 per year, so the projection factor is simply 26 / periodsElapsed.

The key insight is that biweekly pay gives exactly 26 or 27 pay periods per year, depending on where the first pay date lands. We can compute the total by deriving the first period end date from any uploaded paystub (using its period number and date), then counting how many 14-day intervals fit in the calendar year. This is stored per-paystub at upload time so the projection factor is always total_periods / current_period with accurate values for both.

The pay-period-based projection (totalPeriods / currentPeriod) is more accurate than calendar-fraction for biweekly earners because pay isn’t evenly distributed across calendar days. Early in the year, a calendar-fraction approach underestimates (you might have 2 pay periods in January but the calendar says only 1/12 of the year has passed). The period-based approach correctly says “you’ve received 2 of 26 paychecks, so multiply by 13.”

Right now, every non-BALANCES line uses the same projectionFactor = totalPeriodsInYear / periodsElapsed (pay-period-based). But spending is driven by calendar time, not paychecks. If you’re 2 pay periods into the year (~4 weeks), the spending projection should use 12 / monthsElapsed rather than 26 / 2. These two factors can diverge significantly early in the year.

The divergence between factors (13.0x vs 12.6x) is real and meaningful. Two biweekly pay periods cover exactly 28 days, but calendar-wise Jan 30 is 29/365 = 7.9% of the year. Income arrives in discrete chunks (paychecks) while spending is continuous, so they need different annualization methods. The provenance chains make this visible: income rows show “26 / 2 pay periods” while spending rows show “12 / 1 months elapsed.”

courses

Restructured the gpu-scheduling-viz and gavel repos, separating JavaScript frontend code from Python backend tools into proper src/ and tools/ directories. Cleaned up approximately 36GB of unnecessary data (duplicate viz directories, intermediate logs, and debug output). Then worked on getting all Gavel+FGD experiments visualizable, building a manifest-driven experiment browser with tag-based filtering, and fixing the viz tool to handle telemetry-only data files that lack per-GPU allocation details.

Why this restructure matters: The current flat layout mixes JavaScript frontend code (11 files) with Python backend tools (4 files) in the root directory. Separating into src/ and tools/ follows the common “language-based separation” pattern — JavaScript modules stay co-located so their relative ./ imports remain valid, and Python modules get their own sub-package so they can be imported cleanly via viz.tools.*. The key insight: because index.html serves as the web root and fetch() calls resolve relative to the page URL (not the script’s location), data paths like data/foo.viz.bin still work without changes.

Space savings breakdown: The gavel repo has significant bloat from three sources: (1) cluster/viz/ is a full 5.4GB duplicate of the gpu-scheduling-viz data directory — likely created when the viz tool was developed inside the gavel cluster runner, (2) experiments/fgd/telemetry/*_viz.log are 6.1GB intermediate logs that were preprocessed into .jsonl files, and (3) src/scheduler/logs/ contains 19GB of local debug run output (much more than the plan estimated). The .gitignore already excludes **/logs/ from git, so these are purely local disk waste.

The trap of “unused” upstream code: set_queue.py looks like a standalone utility, but scheduler.py actually uses set_queue.SetQueue() (line 173) for managing worker IDs. This is a data structure (a queue that also supports set operations), not the queue-management tool it appeared to be from its name. The lesson: never delete files based on assumptions about their purpose — always grep for actual usage.

Space reclaimed summary: The restructuring deleted approximately 36GB of unnecessary data — significantly more than the plan’s 17GB estimate because scheduler/logs/ turned out to be 19GB (not “a few MB”). The breakdown:

gpu-scheduling-viz/data/*.log: ~5.2GB
gavel/cluster/viz/: ~5.4GB
gavel/experiments/fgd/telemetry/*_viz.log: ~6.1GB
gavel/src/scheduler/logs/: ~19GB (biggest surprise)
Stale notebooks/figures + duplicate papers: ~16MB

Key lesson learned: set_queue.py looked like an “upstream runtime tool” from its name, but it’s actually a data structure (SetQueue) used by the core scheduler. The pre-commit hook caught this, reinforcing the importance of always grepping for usage before deleting files.

The viz binary format packs per-round data as fixed-size records (28-byte base + 2 bytes per GPU type + 4 bytes per GPU for allocations). For the 6200-GPU Alibaba cluster, each round is ~25KB. The telemetry-only files use zero-filled allocation arrays since we only have aggregate stats, not per-GPU placement data. This means the heatmap tab will be blank, but all timeseries charts work perfectly.

Design pattern: Manifest-driven UI. Instead of file pickers that require the user to know filenames, we use a manifest.json that acts as a metadata index. The JS loads this on startup and builds the filter UI dynamically from the tag vocabulary. This means adding new experiments later only requires updating the manifest — no code changes needed.

Tag-based AND filtering. When multiple tags are active, the filter uses AND logic (experiment must match ALL active tags). This is the natural behavior for narrowing down: clicking “alibaba” + “gavel” gives you only Gavel experiments on the Alibaba trace. Users can deselect tags to widen the filter.

Layout principle: separate concerns into rows. The old design jammed three conceptually different things (branding, experiment selection, playback) into one flex row. The fix gives each its own horizontal band:

Header bar: title + playback controls (compact, always visible)
Experiment browser panel: two-column grid with tags + dropdowns (has room to breathe)
Timeline: scrubber + phase markers

Each section scales independently. The experiment browser uses grid-template-columns: 1fr 1fr which collapses to 1fr on narrow screens.

Why a root-level CLAUDE.md matters: With three interrelated repos (courses, gavel, gpu-scheduling-viz), the biggest risk isn’t code bugs — it’s organizational drift. Future sessions might not know that viz is a symlink, that Python imports changed from viz.* to viz.tools.*, or that set_queue.py looks deletable but isn’t. The CLAUDE.md acts as a “repo constitution” that persists across conversations, encoding the hard-won lessons from today’s restructuring directly where Claude will see them.

All 9 Gavel+FGD experiments now have visualizable data. The 180/360 jph runs are partial (they timed out at the 4-hour SLURM limit), but the ~900 and ~480 rounds respectively still capture meaningful behavior — enough to see utilization curves, queue dynamics, and fragmentation patterns during the overloaded regime. Comparing these partial runs against the matching baseline and Gavel-only experiments at the same load levels will show whether FGD’s fragmentation-aware placement helps under high contention.

The original if (!file) return; silently ignored the blank selection, so once two experiments were loaded, there was no way to go back to single-sim mode. Now selecting -- select experiment -- triggers model.clearSimulation(), which fires the simulationCleared event, hides the section, applies the single-sim CSS class, and collapses the queue panel — restoring the single-simulation view.

The root cause: telemetry-only .viz.bin files have numJobs = 0 and all-zero allocation arrays (no per-GPU job IDs), because the SLURM logs only contained TELEMETRY summary lines, not the individual allocation/arrival/completion EVENTs. The metrics code was deriving Running/Queued counts and heatmap segments from the per-GPU allocation data, which was empty. The fix uses sim.header.numJobs > 0 to detect whether full allocation data is available. When it’s not, the code falls back to the aggregate telemetry fields (jobsRunning, jobsQueued, gpuUsed[t]). The heatmap bars show a single “active” color instead of per-job-category colors.

openclaw

Built an email poller that bridges Gmail to OpenClaw via Himalaya CLI, using a “oneshot + timer” systemd pattern for reliability. Debugged a Signal delivery failure in openclaw agent --deliver and worked around it with a two-step approach: agent generates the response, then message send delivers it separately. Implemented dual idempotency (Gmail label move + local state file) to prevent double-processing of emails.

Email poller design: This script follows the “oneshot + timer” systemd pattern — the service runs once, does its work, and exits. The timer triggers it every 5 minutes. This is simpler and more robust than a long-running daemon because systemd handles scheduling, logging, and crash recovery. The dual idempotency layer (move-to-folder + state file) protects against the edge case where Himalaya moves the email but the script crashes before recording it in state.

Systemd timer design choices: Persistent=true means if the machine was off or asleep when a timer would have fired, it runs immediately on wake-up — important for a laptop/Surface Pro that may sleep. RandomizedDelaySec=30 jitters the exact firing time to avoid thundering herd problems if multiple timers share the same schedule. The service has no [Install] section because it’s only triggered by the timer, never enabled directly.

Idempotency in polling systems: The email poller uses two independent layers to prevent double-processing: (1) moving the email to a “processed” Gmail label so it won’t appear in future inbox listings, and (2) a local state file tracking processed envelope IDs. Layer 1 is the primary mechanism; layer 2 handles the edge case where Himalaya’s move succeeds on Gmail’s side but the script crashes before the next iteration sees the moved email. The state file caps at 500 IDs to prevent unbounded growth — old IDs naturally expire since they’ll never appear in inbox listings again anyway.

The openclaw message send bridge pattern: Rather than having the poller classify emails (is this a podcast request? a research task?), it forwards the raw body to the gateway with an [Email: ...] prefix. This keeps the poller dead simple — it’s just a transport bridge. The OpenClaw agent on the receiving end has the intelligence to parse natural language instructions and route them to the right skill (podcast generation, research, etc.).

Himalaya CLI argument ordering: Himalaya uses <TARGET> <ID>... for message move — the destination folder comes before the envelope IDs. This is a bit unusual (most CLIs put the “what” before the “where”), but it makes sense because Himalaya supports moving multiple IDs at once, so the variadic <ID>... argument must come last.

openclaw message send vs openclaw agent: message send is a dumb pipe — it delivers a raw message directly to a recipient. openclaw agent sends a message to the AI agent, which interprets it using its system prompt, skills, and tools, then optionally delivers its reply via --deliver. The agent here has the article-podcast skill, so when it receives an email about blog articles, it can figure out the user wants podcasts and ask for confirmation before acting.

The gateway logs confirm every Signal delivery attempt fails with Signal RPC -1: Failed to send message, including the heartbeat. Yet direct signal-cli RPC calls work perfectly. This points to a bug in how the OpenClaw gateway constructs Signal RPC requests — possibly sending the UUID with the uuid: prefix that signal-cli doesn’t expect in the recipients array, or a different API method signature.

OpenClaw has two separate Signal delivery paths: message send (core messaging, works) and agent --deliver (broken, Signal RPC -1). By splitting “generate response” from “deliver response” into two steps, we bypass the bug and get reliability. This is a common pattern — decoupling operations that can fail independently makes systems more robust.

The two-step approach (agent generates + message send delivers) works around the openclaw agent --deliver bug. The _extract_agent_response parser handles the messy stdout that includes Config warnings and other non-JSON output by scanning for the last complete JSON object. This is more robust than trying to split on newlines or find the first {.