Dev Log: February 1, 2026

courses

Worked on reproducing the FGD (Fragmentation Gradient Descent) GPU scheduler paper. Fixed the fragmentation formula to account for memory and GPU type constraints (not just GPU count), explored why fragmentation metrics behave counterintuitively at low demand, and debugged the per-task-type fragmentation breakdown. Discovered that our per-node binary check was collapsing the nuanced per-(node, task-type) weighted computation, explaining why Fig 9d showed nearly 100% deficient.

The fragmentation formula F_n(m) should count ALL free GPUs as fragmented when a task type can’t run on that node for ANY reason — CPU, memory, GPU type mismatch, or insufficient GPU count. Our code was missing the memory and GPU type checks. In a heterogeneous cluster like Alibaba’s (32GB to 1TB memory per node), the memory constraint becomes binding as tasks get placed and memory gets consumed, causing GPUs to become “stranded” — available but unusable.

The FGD fragmentation formula treats non-GPU tasks specially: when gpu_request = 0, ALL unallocated GPUs are “fragmented” from that task’s perspective because none of them can be utilized by a CPU-only task. If 13% of your workload is non-GPU, that alone creates a 13% baseline fragmentation. As GPUs get allocated, fewer remain to be “fragmented” by non-GPU tasks, so the baseline shrinks — creating the decreasing trend seen in the paper.

Why Fig 7b starts high and drops: The fragmentation/total metric is a product of two opposing forces: (1) frag rate increases as GPUs fill up, but (2) total unallocated GPUs decrease. At low demand, ~14.5% of workload is non-GPU tasks, and for those tasks ALL free GPUs are “fragmented” (a GPU can’t serve a CPU-only job). As GPUs get allocated, fewer total GPUs exist to be fragmented, so the ratio falls despite the rate rising. This is why the paper notes “there is no clear trend of fragmentation proportion growing or decreasing.”

The workload distribution and the task submission list serve completely different purposes. The distribution is a measurement instrument — it defines what “fragmentation” means by saying “given these are the tasks the cluster will see, how many GPU resources are unusable?” The task list is what actually fills the cluster. They come from the same trace but are used differently.

Why “inflation” and not trace replay? The paper uses inflation because it isolates the placement policy’s effect on fragmentation. In a trace replay, tasks arrive and depart — the fragmentation curve depends on arrival/departure patterns, not just placement quality. Inflation removes departures entirely, making it a pure stress test: “as you fill the cluster, how badly does your policy fragment it?”

Why FGD is slower: FGD evaluates fragmentation delta for every candidate node, which means computing F_n(M) twice per node (before and after hypothetical placement). With 1213 nodes and 8 task types in the workload, that’s ~19,000 fragmentation computations per task. Baselines just compare a scalar score.

The lambda parameter is an inverse relationship: lambda = 3600 / jobs_per_hr. Higher jobs/hr means smaller lambda (shorter inter-arrival times), which stresses the cluster more.
The measurement window (jobs 4000-5000) is a standard technique in simulation: you let the system warm up with ~4000 jobs so it reaches steady state, then measure the next 1000 jobs. This avoids transient startup effects biasing your results.
Each round’s allocations are captured between two consecutive TELEMETRY lines. The code resets the allocation array to zeros after each telemetry marker because Gavel re-solves the entire allocation from scratch every round (it is not incremental).

The workload distribution serves a dual purpose: it is both the definition of “what counts as fragmented” (for measurement) and the source of tasks being submitted. The paper’s key insight is that fragmentation should be measured relative to what tasks actually look like in production, not in absolute terms.
Non-GPU tasks are included in the popularity distribution (lowering all GPU task weights) but never actually submitted in the inflation loop. They affect fragmentation measurement because a node’s CPU consumed by non-GPU tasks would reduce its ability to run GPU tasks.

The paper’s fragmentation breakdown is computed the same way as F_n(m) — for each (node, task) pair, you classify why the GPUs are fragmented (insufficient GPU? insufficient CPU? no GPU needed?), then weight by task popularity. Our code instead does a single per-node check (“can ANY task fit?”), which collapses a nuanced per-task-type distribution into a binary answer and loses the breakdown information.
This likely explains why our Fig 9d shows nearly 100% deficient: the per-node check finds at least one task type that does have enough CPU, so it never classifies anything as stranded, even though many task types individually would see stranded GPUs on that node.

Fragmentation is a property of resources, not of tasks. A rejected task is a consequence of fragmentation, but the metric measures GPU waste.
The workload popularity weighting is what makes FGD powerful — it doesn’t just pack bins tightly, it considers what future tasks will look like when deciding where to place the current task.
The FGD scheduler minimizes the delta (change) in this fragmentation metric per placement. It asks: “which node placement will increase expected wasted GPUs the least?”

The three categories map directly to the paper’s quadrant framework (Fig 4b): non-gpu = x-axis, stranded = Q-IV (has GPU, no CPU), deficient = Q-I/Q-II/Q-III (insufficient GPU capacity)
Non-GPU fragmentation acts as a constant “floor” — it’s a property of the workload mix, not the scheduler. This is why the paper includes it: to show what fraction of measured fragmentation is actually controllable.
The per-(node, task-type) weighted computation is essential. Our current code does a per-node binary check which is why Fig 9d shows ~100% deficient — it’s collapsing the nuance.

personal-finance

Built a deterministic tax calculation pipeline using Bun and SQLite. Parsed W-2 documents from ADP and Gusto formats, implemented federal bracket math with payroll taxes (SS/Medicare), and added household filing comparison. The system follows an “AI orchestrates, scripts execute” pattern where financial calculations never depend on probabilistic AI output.

Deterministic PDF parsing (using libraries like pdfplumber or tabula) is preferable to AI extraction for recurring documents with a known format. You build the parser once, and it produces identical results every time — no risk of the AI misreading a number or changing its interpretation between runs.

The “AI orchestrates, scripts execute” pattern is powerful for personal finance because:

Auditability — you can re-run any script with the same inputs and verify it produces the same output. If your tax number looks wrong, you debug a Python function, not an AI prompt.
Composability — the AI can chain scripts in different orders depending on the conversation flow, while each script remains simple and single-purpose.
Trust boundary — financial calculations should never depend on probabilistic AI output. The AI’s job is to figure out what to compute and what questions to ask, not to do the math.

The plan stores the 2025 standard deduction as $31,500 (post-OBBB Act, signed July 2025), but your spreadsheet uses $30,000 (the pre-OBBB amount). Since you’re filing for 2025, the $31,500 number is correct — the spreadsheet was likely created before the law changed. Task 9 specifically validates this discrepancy so we catch it early.

The taxcalc approach gives us a nice separation: fetch_tax_config.py is the bridge between the external data source and our system. It runs once per year (or whenever you want to check for updates), generates a static JSON config, and that config gets committed to the repo. Our tax_calculator.py reads the JSON and does math — it never touches taxcalc directly. This means if taxcalc changes its API or we want to switch data sources, we only change one file.

The "module": "index.ts" field is a leftover from bun init — it defaults to pointing at the generated index.ts which was deleted. For CLI tools (not libraries), this field is unnecessary. The "latest" pin on @types/bun is another bun init default that can cause non-deterministic installs.

The implementer discovered that better-sqlite3 doesn’t work with Bun and switched to bun:sqlite, which is Bun’s built-in SQLite binding with a compatible API. This is a sensible adaptation — Bun’s native SQLite support is faster and doesn’t require native compilation. The plan specified better-sqlite3 but bun:sqlite is the correct choice for a Bun project.

The bun:sqlite built-in replaced better-sqlite3 since Bun’s native SQLite is faster and doesn’t need native compilation.
pdf-parse v2 uses a PDFParse class API (not the v1 function API), which the implementer adapted to correctly.
The W-2 parser handles two distinct formats (ADP tab-separated with 3 copies, Gusto with EIN-based line structure) through format detection, not a single universal regex.

The tax computation is now fully functional with all three layers: federal bracket math, payroll taxes (SS/Medicare), and household comparison. The filing jointly recommendation for this household saves ~$8,006 in taxes because the joint brackets are wider, keeping more income in lower brackets when one spouse earns significantly more than the other.

tools / openclaw

Built the exec-grant plugin for OpenClaw, implementing a time-limited sudo-style approval system for LLM agent command execution. The plugin uses WhatsApp (and later Signal) as the approval channel, with systemd timers for tamper-resistant revocation and a self-healing status check as a safety net. Evolved from a binary “full access” grant to granular capability grants, dealt with cross-platform date handling, and debugged several integration issues including the exec-approvals.json format mismatch, npm package naming conflicts, and Signal’s phone-to-UUID migration.

The assistant essentially bricked itself: it issued a systemctl --user stop on its own service unit, which killed the process before it could issue a matching start. This is a classic self-update footgun — the process can’t restart itself via systemd because the stop signal terminates it.
The rapid SIGUSR1 restart loop (4 restarts in ~4 seconds) happened because each config write triggered a file-watcher reload. This suggests the assistant made multiple config patches in quick succession.

The doctor command handled the migration seamlessly — it detected the legacy ~/.clawdbot directory, created ~/.openclaw, symlinked the old path, and migrated the JSON config. This means any old scripts referencing ~/.clawdbot still work.
The DM session isolation (per-channel-peer) is important if multiple people message your assistant — without it, person A’s conversation context could leak into person B’s replies.
The assistant originally bricked itself because it ran systemctl --user stop on its own service unit during the upgrade attempt. The new OpenClaw gateway install + gateway start commands handle this correctly by first installing the new unit, then starting it.

The most dangerous combination here is C1 + C4: passwordless sudo + unrestricted agent exec. This means anyone who sends a crafted WhatsApp message could potentially get the LLM to run arbitrary commands as root. Prompt injection via messaging apps is a real attack vector for LLM-powered assistants.
The WhatsApp pre-keys (C3) are the cryptographic material that authenticates your WhatsApp Web session. If stolen, an attacker could clone your session and read/send messages as you.
Gateway being loopback-only is the biggest thing working in your favor — even though SSH and RDP are exposed, the OpenClaw API itself isn’t directly reachable over the network.

This is an “allow-by-default for safe ops, human-in-the-loop for dangerous ops” model. It preserves the assistant’s full capability while preventing catastrophic mistakes.
The autoAllowSkills flag automatically allowlists binaries referenced by your installed skills (like gog), so skill usage doesn’t require manual approval.
The biggest design decision is what goes in the initial allowlist. Too restrictive and you’ll get spammed with approval requests. Too permissive and the guardrail is meaningless.

This is essentially the same pattern as sudo with a timestamp (sudo caches your password for 5-15 minutes). The difference is that the “password” here is a WhatsApp approval, which works when you’re away from the keyboard.
The systemd timer approach is important because it means the agent cannot prevent revocation. Even if the LLM were compromised via prompt injection, it can’t stop the timer from firing — systemd-run creates a transient timer unit that’s managed by the init system, not the agent process.
One edge case to consider: what happens if the agent is mid-command when the window expires? The running command finishes (you don’t kill running processes), but any new dangerous command after expiry gets blocked.

The reason this doesn’t exist yet is probably that most OpenClaw users either run with security: "full" (convenience over safety) or security: "deny" + sandbox (maximum lockdown). The “I want full power but with a conscious approval step” middle ground is underserved.
The hardest design question is: what happens if the agent needs to request a second grant while one is already active? Options: extend the timer, deny (one window at a time), or require a new approval. I’d recommend extending the timer — the admin already approved the session, and requiring re-approval mid-task is the same approval fatigue problem.
For the upstream PR, the threat model doc (references/threat-model.md) is important. Reviewers will want to understand: what if the agent tricks the admin into approving? What if someone else sends /grant from a different number? What if the agent modifies revoke.sh before the timer fires?

The plan calls for an openclaw.plugin.json manifest and index.ts entry point, but the plugin-structure skill shows Claude Code plugins use .claude-plugin/plugin.json with auto-discovery of commands/skills/hooks directories. I’ll need to reconcile these — the plan’s openclaw.plugin.json and index.ts suggest this is for an “OpenClaw” platform (not a Claude Code plugin), so I’ll follow the plan’s structure faithfully.

The shell scripts need cross-platform date handling (GNU vs BSD date). On macOS, date -v+Nm adds N minutes, while on Linux it’s date -d "+N minutes". The scripts detect the platform and branch accordingly. Also, the timer scheduling uses three strategies: systemd-run (Linux, tamper-resistant), launchd (macOS, OS-level), and sleep & (fallback). The systemd approach is preferred because the agent process cannot cancel it.

The alias wraps the claude binary so every new shell invocation passes --dangerously-skip-permissions automatically.
To launch without bypass in a specific case, use command claude (bypasses the alias) or pass explicit flags.
Within a running session, you can use the /permissions slash command to switch modes without restarting.

command claude and \claude are two standard shell techniques to skip alias expansion in zsh/bash.
Aliases only apply to the first word of a command, so claude --resume triggers the alias but something like npx claude would not.

The SKILL.md uses ${CLAUDE_PLUGIN_ROOT} for all script paths rather than hardcoded paths. This is the OpenClaw equivalent of Claude Code’s plugin root variable — it resolves at runtime to wherever the plugin is installed, making the skill portable across installations.

The index.ts uses OpenClaw’s api.registerCommand() pattern discovered from the Explore agent. Commands with requireAuth: true mean only the admin phone (from channels.whatsapp.allowFrom) can trigger them. The /grant-status command has no auth requirement so the agent can check status freely. The plugin shells out to the bundled scripts using __dirname to resolve paths, which works because OpenClaw loads plugins via jiti (a TypeScript runtime loader).

Key design points worth noting:

Two-file separation — exec-approvals.json (gateway reads) vs exec-grant-state.json (plugin lifecycle tracking) keeps the gateway’s security enforcement decoupled from grant semantics
Self-healing in status.sh — If the timer fails to fire, the next status check detects the expired timestamp and triggers revoke, providing a secondary safety net
execFileSync over execSync in index.ts — Avoids spawning a shell entirely, passing args as an array. This eliminates shell injection even if ctx.args contained malicious input (the /^\d+$/ regex check adds defense-in-depth)

The files field in package.json acts as an allowlist for npm publish — only index.ts, openclaw.plugin.json, and the entire skills/ directory get included in the tarball. This keeps the published package lean (no .git/, no .gitignore, etc.) while ensuring the bundled scripts and SKILL.md are distributed. The openclaw.install block tells the openclaw CLI how to install this plugin from npm.

The systemd timer should have fired at exactly 60s, but the self-healing in status.sh triggered first (at ~75s when I ran the check). This is by design — the self-healing is a safety net. Looking at the timestamps: grant at 19:11:46, revoke at 19:12:59 (73s later). The systemd timer may have fired but the revoke was already done (it’s idempotent). The two-layer approach (OS timer + self-healing on status check) provides defense in depth.

The openclaw message send CLI uses --target (or -t) not --to. The spam of config warnings is because each CLI invocation boots up the full config, and the mismatch warning fires on every config load. The actual error at the bottom is: required option '-t, --target <dest>' not specified. I need to fix the flag name in the scripts.

Adding adminPhone to the plugin’s configSchema is the right approach. OpenClaw plugins define config schemas in their manifest, and the admin configures values in openclaw.json under plugins.entries.<id>.config. The scripts can then read it via openclaw config get plugins.entries.exec-grant.config.adminPhone. This makes the admin phone explicit rather than guessing from the allowFrom array.

The openclaw binary is at ~/.npm-global/bin/openclaw but that directory isn’t on PATH in non-interactive SSH sessions. The gateway process itself likely has PATH set correctly (started via systemd with the user’s profile). The scripts need the full path to work reliably when called from the gateway process or from SSH. Rather than hardcoding a path, I’ll add a preamble that finds openclaw on common locations.

The audit log at ~/.openclaw/exec-grant-audit.log shows the revoke happened at 19:30:45Z — about 45 seconds after the 19:29:58Z expiry. The systemd timer likely fired on time but the revoke.sh WhatsApp notification took a moment. The self-healing in status.sh provided the secondary trigger here since I called it after expiry. Both safety layers (OS timer + self-healing) are working as designed.

The npm install created a new directory openclaw-plugin-exec-grant (using the npm package name) instead of exec-grant (the manifest ID). Then it tried to add openclaw-plugin-exec-grant as a config entry, which conflicts with the existing exec-grant entry. The plugin ID mismatch is causing real issues now — OpenClaw expects the extension directory name to match the manifest ID. We already have the plugin installed manually as exec-grant/, so this npm install created a duplicate.

The changes needed are:

Config schema: Replace adminPhone (WhatsApp-specific) with generic adminTarget (any channel identifier) and channel (which messaging channel to use). The target format varies by channel — E.164 for WhatsApp/Signal, chat ID for Telegram, channel/user for Slack/Discord.
Scripts: Replace hardcoded --channel whatsapp with the configured channel value.
index.ts: On plugin register, detect missing config and log a setup message so the admin knows what to configure.
The /grant command receives ctx.channel — we know which channel the admin replied from. The grant/revoke notifications should reply on the same channel the command came from, not a preconfigured one. Only request.sh (agent-initiated) needs the configured channel.

The key design choice here: grant.sh and revoke.sh get the channel/target via environment variables (EXEC_GRANT_CHANNEL, EXEC_GRANT_TARGET) passed from the command handler. This way, when an admin replies /grant 15 from Telegram, the confirmation goes back to Telegram — not to a preconfigured WhatsApp number. Only request.sh (agent-initiated, no incoming message context) reads from the plugin config.

The exec-approvals.json on the Surface Pro has a different structure than what the plan assumed. It has version, socket, defaults, and agents fields at the top level (with agents currently empty). The scripts use jq '.agents.main.security = "..."' which will create the path if it doesn’t exist, so that’s fine. But the structure also has a socket field with auth tokens, so the targeted jq edit approach (only touching .agents.main.security) is the right call — a full file overwrite would destroy the socket config.

The doctor output reveals that openclaw auto-derives an “expected” plugin ID from the npm package name (openclaw-plugin-exec-grant) but the manifest’s id field says exec-grant. OpenClaw strips the openclaw-plugin- prefix from package names when deriving plugin IDs, but the doctor check is comparing against the full package name as a hint. The warning is cosmetic — the plugin loaded — but let me check the tail of the output for actual pass/fail status.

OpenClaw derives the plugin ID hint from the npm package name field. It strips common prefixes like openclaw-plugin- to get the canonical ID. The warning appears because the matching logic expects the package name to be exactly the manifest ID, or follow the openclaw-plugin-<id> convention. The plugin info shows it loaded correctly as exec-grant, so this is purely cosmetic. Still, cleaning it up is good practice for when the plugin is installed from npm — the openclaw plugins install command uses this hint for matching.

Key design choices in this rewrite:

ensureAllowlistBaseline() runs on every plugin load (gateway start). It creates exec-approvals.json with a sensible default allowlist if the file doesn’t exist, or sets security: "allowlist" if the file exists but has no security mode. It never overwrites existing fields — preserving the socket/version data we saw earlier.
@clack/prompts is used for the interactive setup because it’s already bundled with OpenClaw (used in their own wizard/onboarding). Dynamic import with a fallback to plain console output means the plugin doesn’t break if the dependency isn’t available.
getTargetHint() provides channel-specific guidance — a WhatsApp user sees “E.164 phone number” with a phone placeholder, while a Slack user sees “channel or user ID” with #approvals. This removes guesswork during setup.
Config is written via openclaw config set rather than directly editing JSON, which respects OpenClaw’s config validation and file locking.

The registerCli callback receives { program, config, workspaceDir, logger } where program is a Commander.js instance. The plugin adds subcommands to it. The commands array in the options tells OpenClaw which top-level command names to expect (so it doesn’t skip them as conflicts). The registrar is called at CLI startup, so the setup command will be available as openclaw exec-grant-setup (or whatever we register).

The Surface Pro version makes a key architectural shift: instead of a single binary “full access” grant, it introduces granular capability grants. Each grant type (email, code, packages, network, etc.) can be independently activated and timed, using the agents.main.grants object in exec-approvals.json. This follows the principle of least privilege — the agent only gets the specific capabilities it needs, not blanket shell access.

The granular grant types system uses two complementary detection mechanisms:

Command pattern matching (GRANT_PATTERNS) — regex patterns that map shell commands to required grant types automatically, so the gateway can enforce access control without trusting the agent’s self-report
File path pattern matching (CODE_PROTECTED_PATHS, SECRETS_PROTECTED_PATHS) — regex patterns that protect sensitive files at the filesystem level, distinguishing between read and write operations

Each grant type gets its own systemd timer unit (exec-grant-revoke-<type>), so revoking one type doesn’t affect others.

Signal setup for OpenClaw requires either signal-cli (a Java-based CLI tool) or a Signal HTTP daemon. Since the Surface Pro doesn’t have Java installed but does have Docker, the cleanest approach is to run signal-cli-rest-api as a Docker container. This container bundles signal-cli with a REST API that OpenClaw’s Signal plugin can connect to via --http-url.

Signal has been migrating from phone-number-based addressing to UUID-based addressing (ACI — Account Identity). Your number +16083207152 resolves to UUID d4e31a04-c781-45d8-ad2c-bb826fc80574. The phone number lookup fails, but addressing by UUID works. This is a known issue with signal-cli when the recipient’s phone number isn’t discoverable via Signal’s directory service (often due to privacy settings).

The gateway’s exec-approvals.json format is completely different from what we assumed. The gateway expects:

version: 1 at the top level
Allowlist entries as objects with {pattern, lastUsedAt}, not plain strings
A socket and defaults section

Our plugin was writing a flat string array (["ls", "cat", ...]) which the gateway ignored entirely. This means the allowlist was never enforced — the bot had unrestricted access all along. The security: "allowlist" field we wrote was in the right place, but our allowlist format was wrong.

The openclaw config set command fails on +16083207152 because the CLI tries to auto-detect the value type and interprets +16083207152 as a numeric expression rather than a string. This is a UX gap in the setup flow — the interactive exec-grant-setup command would handle this correctly since @clack/prompts returns strings, but the manual config set path breaks. Worth filing upstream or wrapping the config set call in the setup script to force string type.