Bhavana AI

AI/ML insights

Dev Log: February 13, 2026

courses

Continued work on the Gavel scheduler experiments, focusing on heterogeneity-aware allocation and debugging GPU sharing issues. Ran experiments showing that max_min_fairness without heterogeneity awareness fails badly on mixed-GPU clusters, which is a core finding for the paper. Fixed a latent bug in Gavel’s set_queue.py where a missing import caused GPU sharing teardown to crash, and explored the differences between LP-based and FGD scheduling configurations. Also worked on a course deliverable involving document generation with embedded hyperlinks.

This actually IS the research finding — max_min_fairness without heterogeneity awareness performs terribly on a heterogeneous cluster. It can’t properly utilize GPUs across 6 different types, leaving most of the cluster idle. This is exactly what the paper aims to show: FGD placement alone isn’t enough; you need Gavel’s heterogeneity-aware allocation too.

The set_queue.py raises Empty on line 20, but Empty isn’t imported from the queue module. This is a pre-existing bug in Gavel’s code that only manifests when GPU sharing causes the same worker ID to be removed twice — fractional jobs share a GPU, and when one finishes, the worker ID gets removed, then when the second finishes it tries to remove it again.

The bug chain: With GPU sharing, two fractional jobs share the same worker ID. When the second job gets scheduled to that GPU, _remove_available_worker_id tries to remove it again. The scheduler already handles this gracefully at line 3089 (except queue.Empty: return None), but set_queue.py raises bare Empty instead of queue.Empty, so the exception escapes as a NameError instead of being caught.

The one-line import fix is the correct solution — it makes set_queue.py raise queue.Empty, which the scheduler’s existing exception handler catches.

The three scheduling configurations differ fundamentally in how they solve the resource allocation problem. The LP-based policies formulate GPU sharing as a convex optimization (interior-point method), while FGD uses gradient descent on a fragmentation metric. The interaction between allocation policy and placement strategy creates the real complexity differences.

The ExternalHyperlink component in docx-js wraps child TextRun elements with a URL. Word renders these as clickable blue underlined links. When this .docx is uploaded to Google Docs, the hyperlink is preserved, so readers can click through to the YouTube demo directly.


short-projects

Worked on batch devlog generation tooling and various scripting fixes. Fixed a manual arg-parsing bug involving double-processing of flag values, handled macOS compatibility for the timeout command, solved Claude Code nesting detection issues, and ran the full batch generation pipeline to produce 39 historical devlogs from 1,156 insight blocks.

This is a classic manual arg-parsing bug. When you process --flag value in a loop over all args, you must skip the value on the next iteration, otherwise it gets processed twice — once as the flag’s value and once as a positional argument. The argparse module handles this automatically, but for simple scripts with 1-2 flags, the skip_next pattern is a common lightweight alternative.

  • 41 dates have insight blocks (out of 44 with activity). 3 dates (12/28, 1/10, 1/12) had conversations but no insight blocks, likely simpler sessions
  • After excluding 02/11 and 02/12 (already done), 39 dates need devlogs
  • The insight count varies widely: from 1 (Jan 18) to 81 (Feb 1), reflecting different levels of coding activity

macOS doesn’t include GNU timeout. It’s a Linux coreutils command. On macOS you need either brew install coreutils (provides gtimeout) or a manual background process with kill pattern.

  • claude -p detects nesting via the CLAUDECODE environment variable. When launching claude -p from inside a Claude Code session (e.g., via Bash tool), child processes inherit this variable and refuse to start.
  • env -u CLAUDECODE strips just that variable from the child’s environment, allowing claude -p to run independently while preserving PATH and everything else.
  • This is a common pattern when shell tools detect nesting via environment markers (similar issues exist with TMUX, INSIDE_EMACS, etc.).
  • set -e with background processes: When using set -Eeuo pipefail, any command that returns non-zero exits the script. Background process management (kill, wait) often returns non-zero in normal operation (process already dead, signal delivery race). Always use || true on kill/wait calls in scripts with set -e.
  • Total batch stats: 39 devlogs generated from 1,156 insight blocks, spanning 7 weeks of development activity. Average generation time ~90 seconds per devlog, with larger insight sets (40-80+ insights) taking up to 4 minutes.

openclaw

Debugged authentication and API scope issues across Google services. Investigated NotebookLM session behavior with cookie-based auth via Playwright, and resolved Google OAuth scope configuration for Workspace APIs like Calendar and Gmail. Also dealt with the 7-day token expiry limitation of Google OAuth apps in “Testing” mode.

The NotebookLM library uses cookie-based auth via Playwright’s storage state (saved by notebooklm login). Even though Google cookies have long expiry dates, Google can server-side invalidate sessions. But in this case, 200 OK responses + successful completions confirm the session is still alive. The pattern of some URLs timing out while others succeed suggests a NotebookLM-side issue with specific content, not an auth problem.

gcloud auth application-default login only grants cloud-platform + openid + userinfo.email by default. Google Workspace APIs (Calendar, Gmail, Drive) require explicitly requesting their scopes via --scopes. This is a common gotcha — GCP APIs and Workspace APIs use separate scope families, and the default login doesn’t include Workspace scopes.

Google OAuth apps have three publishing states: Testing, Internal, and Production. In “Testing” mode, refresh tokens expire after 7 days and only test users can authenticate. This is a common gotcha for personal projects. To fix it permanently, you can either:

  • Move the GCP OAuth consent screen to “Internal” (requires a Google Workspace account)
  • Move to “Production” and submit for verification (overkill for personal use)
  • Or just re-auth every 7 days (the current situation)