Bhavana AI

AI/ML insights

Dev Log: February 3, 2026

courses

Worked on integrating FGD (Fragmentation Gradient Descent) placement into the Gavel scheduler simulator. Started by validating that the enable_fgd feature toggle keeps existing behavior unchanged when disabled, then fixed a bug where multi-GPU jobs spanning multiple servers were failing placement under FGD’s single-node semantics. Ran Phase D experiments comparing Random, BestFit, and FGD placement modes under FIFO scheduling, confirming that FGD produces the lowest fragmentation (1.88 vs 2.00 for Random) while maintaining identical JCTs at moderate load. Also tracked down an infinite loop in the simulate loop caused by stale next_job_arrival_time state after placement failures.

  • The integration test confirms that adding the enable_fgd flag and the Phase 2 placement branching in scheduler.py has zero impact on existing behavior when the flag is off. This is the key property of a feature toggle — new code paths are completely inert until activated.
  • The FGD placement tests validate the adapter pattern: Gavel’s flat worker-ID space gets restructured into FGD’s hierarchical Node/GPU model, placement decisions happen in FGD’s domain, then results get translated back to worker IDs.
  • FGD was designed for single-node placement decisions. In the FGD paper, a “task” fits on one node. But Gavel supports multi-GPU jobs that span multiple servers (scale_factor=8 with 4 GPUs/server needs 2 servers).
  • The fix is to detect when no single node is large enough and fall through to the multi-node greedy path rather than skipping the job entirely.
  • The identical JCTs make sense for FIFO: FIFO allocates 100% of each job’s resources to one worker type, and Gavel’s throughput model only depends on the worker type, not which specific GPU within that type. So whether job X lands on v100 server 0 or v100 server 3 is irrelevant to throughput — all v100 GPUs are identical.
  • FGD placement matters when there’s resource contention caused by fragmentation: jobs that could run can’t because GPUs are in the wrong places. At lambda=1.0 with 100 jobs on a 108-GPU cluster, the utilization is moderate enough that placement rarely causes contention.
  • The fragmentation metric (0.27) is being tracked correctly. The real test will come in Phase E/F with higher load and heterogeneity-aware policies where placement decisions interact with the allocation algorithm.

The simulate loop has two clocks: running_jobs (a heap of finish times) and next_job_arrival_time. When both are empty/None, the loop should exit. But the job generation code uses break to exit the inner while loop without clearing next_job_arrival_time, creating a subtle infinite loop when placement failures leave jobs permanently unplaceable.

Phase D Results — FGD Placement Validation (FIFO + single-node semantics)

MetricRandomBestFitFGD
Avg JCT (s)186,353185,586185,586
Completed/Failed94/694/694/6
Avg Fragmentation2.001.921.88

Key observations:

  • 6 jobs failed placement across all modes — these are multi-GPU jobs needing >4 GPUs that can’t fit on any single 4-GPU server. This is the correct behavior under FGD’s single-node placement semantics.
  • Fragmentation ordering matches the FGD paper: Random (worst) > BestFit > FGD (best). FGD reduces fragmentation by 6% over random.
  • JCT difference is small (0.4%) between random and bestfit/fgd, because with FIFO allocation the placement strategy only affects which server hosts a job, not which jobs run. The JCT impact grows under contention.
  • BestFit and FGD produce identical JCTs, suggesting the allocation decisions are the same — only fragmentation differs.

tools

Explored Podcastfy and NotebookLM for building an automated podcast generation pipeline. Investigated how Podcastfy separates content understanding from voice synthesis, discovered that LiteLLM natively supports OpenRouter for model routing, and identified that the transcript_file parameter allows decoupling the LLM and TTS steps entirely. Dug into NotebookLM’s actual audio generation (8+ minutes per piece, M4A format despite the .mp3 extension). Built out the OpenClawd podcast plugin with an async job queue, filesystem-based atomicity, and RSS feed generation, while carefully preserving prior production fixes for audio format handling and timeouts.

  • Podcastfy’s architecture separates content understanding (LLM) from voice synthesis (TTS), which means you can swap either component independently — use a cheap LLM with premium voices, or a powerful LLM with free Edge TTS.
  • The “longform” mode uses a chunking strategy where it breaks content into pieces and generates conversation per chunk while maintaining coherence across chunks — this is how it handles long research papers without losing context.
  • Edge TTS is the zero-cost option since it requires no API key, making it viable for a high-volume personal feed.
  • Podcastfy uses LangChain’s ChatLiteLLM as its fallback for non-Gemini models. LiteLLM natively supports OpenRouter — you just prefix the model name with openrouter/ and set OPENROUTER_API_KEY.
  • So we don’t need to hack anything. We pass model_name="openrouter/anthropic/claude-sonnet-4" (or whichever model OpenClawd routes to) and api_key_label="OPENROUTER_API_KEY", and LiteLLM handles the rest.
  • For local mode, it uses Llamafile on localhost:8080, but that’s not what we want here.
  • Podcastfy’s transcript_file parameter lets us decouple the LLM and TTS steps entirely. OpenClawd generates the conversational transcript using its own Antigravity/Codex/Copilot routing, then Podcastfy just does the TTS from that transcript.
  • This means zero dependency on Podcastfy’s LLM integration. We use OpenClawd for what it’s good at (LLM routing with fallbacks) and Podcastfy for what it’s good at (multi-speaker TTS).
  • Podcastfy’s transcript format uses <Person1> and <Person2> XML tags to mark speaker turns. The regex splits on these to create ordered (speaker1_text, speaker2_text) tuples.
  • Each tuple gets TTS’d separately with different voice IDs, then the audio segments are concatenated sequentially.
  • The entire “framework” boils down to: parse tags, call edge_tts.Communicate() per segment with alternating voices, concatenate with pydub. That’s maybe 60-80 lines of Python.
  • notebooklm-py is NOT generating audio itself — it’s a Python wrapper around Google’s actual NotebookLM backend via undocumented RPC calls. When you call generate_audio(), it’s hitting Google’s servers and getting back the same audio that the NotebookLM web UI produces.
  • This means you get NotebookLM’s exact quality (same voices, same conversation style) without clicking through the UI.
  • The tradeoff: it depends on undocumented APIs that Google can break anytime, and requires browser-based auth (Google cookies).
  • The existing notebooklm-py skill already does the generate side: create notebook, add URL, generate audio, download MP3. We don’t need to rewrite any of that.
  • What’s missing from the existing skill is the publish side: uploading to Azure, updating RSS, and the Spotify pipeline.
  • So our plugin shouldn’t duplicate notebooklm-py’s skill — it should extend it by adding the publishing layer on top.

The plugin scaffold follows the OpenClawd plugin convention: openclaw.plugin.json declares the plugin metadata and config schema, package.json declares the npm package with an openclaw field for extension registration, and index.ts is the entry point that receives the API object. The configSchema uses JSON Schema so OpenClawd can validate user config and generate UI hints for settings pages.

The SKILL.md frontmatter description field is the key to intent detection. OpenClawd matches user messages against skill descriptions to decide which skill to invoke. The description is intentionally verbose with trigger phrases (“podcast this”, “queue this up”) so the LLM can match a wide range of natural language patterns. The ${CLAUDE_PLUGIN_ROOT} variable resolves at runtime to the plugin’s install directory, making script paths portable across machines.

The generate script uses a heuristic-based content classifier rather than an LLM call for classification. This is a deliberate design choice: domain matching and keyword counting is fast, deterministic, and free. The is_technical() function checks both the URL (arxiv.org, github.com, etc.) and the page title against keyword lists. Requiring 2+ keyword matches for title-based classification reduces false positives — a single mention of “API” in a cooking blog won’t trigger technical instructions.

Python’s xml.etree.ElementTree has a subtle namespace interaction: ET.register_namespace("itunes", URI) tells the serializer to use the itunes: prefix for that namespace. When you also set xmlns:itunes as an explicit attribute on the element, tostring() outputs both — the registered namespace declaration and the explicit attribute — causing a “duplicate attribute” parse error. The fix is to rely solely on register_namespace() and let the serializer emit the declaration automatically.

The index.ts acts as a bridge between the TypeScript plugin system and the Python scripts. Rather than passing config through environment variables or CLI flags on every invocation, it writes the config to a well-known JSON file at ~/.openclaw/plugins/<id>/config.json during plugin registration. The Python scripts can then read this file via --config. This is a clean separation: TypeScript handles plugin lifecycle, Python handles the actual work.

The E2E test reveals the boundary between what can be automated and what requires human interaction. notebooklm login needs a browser-based OAuth flow (Google auth), and Azure credentials are secrets that should be entered directly. These are both one-time setup steps — once configured, the full pipeline (generate, publish, feed update) runs headlessly via the skill invocation from Signal.

NotebookLM audio generation takes ~8 minutes for a simple essay. A research PDF will likely take longer. The status transitions from in_progress to pending (confusingly) before finally reaching completed. The default 300s timeout in notebooklm-py is inadequate for real usage.

LLM agents frequently take shortcuts when given optional-sounding instructions. The original SKILL.md described the scripts but didn’t forcefully mandate their use. The agent (Gemini) interpreted “generates audio” as something it could do itself. The fix: put “CRITICAL” and “MANDATORY” directives at the top, provide complete copy-paste bash commands with env var setup, and remove any ambiguity about what the agent should do. This is a common pattern in agent skill design — you need to be explicit about what is non-negotiable.

The source venv/bin/activate approach is fragile because: (1) activate scripts have hardcoded VIRTUAL_ENV paths that break when the directory is copied, and (2) in non-interactive shells the activation may silently fail. Using the venv’s python binary directly (venv/bin/python3) bypasses all of this — it automatically uses the venv’s site-packages without needing activation.

NotebookLM outputs audio in M4A/MP4 format (AAC codec) despite the .mp3 download extension. This is worth noting for the generate script — it should either convert to MP3 via ffmpeg after download, or we could update the pipeline to serve M4A directly (most podcast apps support it). For now the ffmpeg conversion works and keeps RSS feed compatibility universal.

The plan layers an async job queue on top of a plugin that hasn’t been built yet. The implementation plan (Task 1-8) defines the base plugin, and the async push plan adds job_manager, worker, notifier, and systemd service on top. I’ll build everything together — the base scripts plus the async enqueue path.

The “duplicate attribute” error happens because ET.register_namespace("itunes", ITUNES_NS) makes ElementTree serialize the namespace as xmlns:itunes, but create_feed also manually sets xmlns:itunes as an attribute on the <rss> element. When serialized, both appear, producing a duplicate xmlns:itunes attribute. The fix: remove the manual attribute and let register_namespace handle it.

The existing codebase already had important production fixes from prior sessions: M4A audio format support, email field in RSS feeds, audio type detection, and an extended 900s timeout. My rewrite dropped these. I need to restore the committed versions and apply only the new async features on top, rather than replacing the files wholesale. This is a common pitfall when rebuilding from a plan document that doesn’t reflect incremental fixes.

  • The filesystem job queue pattern (write .tmp then os.rename) provides atomicity on POSIX systems — rename is atomic within the same filesystem, so a partially written file never appears as a valid job.
  • The worker imports generate and publish as Python functions rather than shelling out to the scripts. This avoids subprocess overhead and keeps everything in one process, though it means the worker needs all dependencies (notebooklm-py, azure-storage-blob) in its venv.
  • I preserved all prior production fixes (M4A audio detection, 900s timeout, ffprobe fallback for duration, source metadata title extraction, RSS email/image elements) that were committed in earlier sessions.

openclaw

The openclaw work overlapped heavily with the tools project, building the podcast generation and publishing plugin for the OpenClawd platform. Investigated Podcastfy and NotebookLM integration paths, built the plugin scaffold with TypeScript/Python bridge architecture, implemented the async job queue with filesystem-based atomicity, and refined the skill description for reliable agent invocation.

  • Podcastfy’s architecture separates content understanding (LLM) from voice synthesis (TTS), which means you can swap either component independently — use a cheap LLM with premium voices, or a powerful LLM with free Edge TTS.
  • The “longform” mode uses a chunking strategy where it breaks content into pieces and generates conversation per chunk while maintaining coherence across chunks — this is how it handles long research papers without losing context.
  • Edge TTS is the zero-cost option since it requires no API key, making it viable for a high-volume personal feed.
  • Podcastfy uses LangChain’s ChatLiteLLM as its fallback for non-Gemini models. LiteLLM natively supports OpenRouter — you just prefix the model name with openrouter/ and set OPENROUTER_API_KEY.
  • So we don’t need to hack anything. We pass model_name="openrouter/anthropic/claude-sonnet-4" (or whichever model OpenClawd routes to) and api_key_label="OPENROUTER_API_KEY", and LiteLLM handles the rest.
  • For local mode, it uses Llamafile on localhost:8080, but that’s not what we want here.
  • Podcastfy’s transcript_file parameter lets us decouple the LLM and TTS steps entirely. OpenClawd generates the conversational transcript using its own Antigravity/Codex/Copilot routing, then Podcastfy just does the TTS from that transcript.
  • This means zero dependency on Podcastfy’s LLM integration. We use OpenClawd for what it’s good at (LLM routing with fallbacks) and Podcastfy for what it’s good at (multi-speaker TTS).
  • Podcastfy’s transcript format uses <Person1> and <Person2> XML tags to mark speaker turns. The regex splits on these to create ordered (speaker1_text, speaker2_text) tuples.
  • Each tuple gets TTS’d separately with different voice IDs, then the audio segments are concatenated sequentially.
  • The entire “framework” boils down to: parse tags, call edge_tts.Communicate() per segment with alternating voices, concatenate with pydub. That’s maybe 60-80 lines of Python.
  • notebooklm-py is NOT generating audio itself — it’s a Python wrapper around Google’s actual NotebookLM backend via undocumented RPC calls. When you call generate_audio(), it’s hitting Google’s servers and getting back the same audio that the NotebookLM web UI produces.
  • This means you get NotebookLM’s exact quality (same voices, same conversation style) without clicking through the UI.
  • The tradeoff: it depends on undocumented APIs that Google can break anytime, and requires browser-based auth (Google cookies).
  • The existing notebooklm-py skill already does the generate side: create notebook, add URL, generate audio, download MP3. We don’t need to rewrite any of that.
  • What’s missing from the existing skill is the publish side: uploading to Azure, updating RSS, and the Spotify pipeline.
  • So our plugin shouldn’t duplicate notebooklm-py’s skill — it should extend it by adding the publishing layer on top.

The plugin scaffold follows the OpenClawd plugin convention: openclaw.plugin.json declares the plugin metadata and config schema, package.json declares the npm package with an openclaw field for extension registration, and index.ts is the entry point that receives the API object. The configSchema uses JSON Schema so OpenClawd can validate user config and generate UI hints for settings pages.

The SKILL.md frontmatter description field is the key to intent detection. OpenClawd matches user messages against skill descriptions to decide which skill to invoke. The description is intentionally verbose with trigger phrases (“podcast this”, “queue this up”) so the LLM can match a wide range of natural language patterns. The ${CLAUDE_PLUGIN_ROOT} variable resolves at runtime to the plugin’s install directory, making script paths portable across machines.

The generate script uses a heuristic-based content classifier rather than an LLM call for classification. This is a deliberate design choice: domain matching and keyword counting is fast, deterministic, and free. The is_technical() function checks both the URL (arxiv.org, github.com, etc.) and the page title against keyword lists. Requiring 2+ keyword matches for title-based classification reduces false positives — a single mention of “API” in a cooking blog won’t trigger technical instructions.

Python’s xml.etree.ElementTree has a subtle namespace interaction: ET.register_namespace("itunes", URI) tells the serializer to use the itunes: prefix for that namespace. When you also set xmlns:itunes as an explicit attribute on the element, tostring() outputs both — the registered namespace declaration and the explicit attribute — causing a “duplicate attribute” parse error. The fix is to rely solely on register_namespace() and let the serializer emit the declaration automatically.

The index.ts acts as a bridge between the TypeScript plugin system and the Python scripts. Rather than passing config through environment variables or CLI flags on every invocation, it writes the config to a well-known JSON file at ~/.openclaw/plugins/<id>/config.json during plugin registration. The Python scripts can then read this file via --config. This is a clean separation: TypeScript handles plugin lifecycle, Python handles the actual work.

The E2E test reveals the boundary between what can be automated and what requires human interaction. notebooklm login needs a browser-based OAuth flow (Google auth), and Azure credentials are secrets that should be entered directly. These are both one-time setup steps — once configured, the full pipeline (generate, publish, feed update) runs headlessly via the skill invocation from Signal.

NotebookLM audio generation takes ~8 minutes for a simple essay. A research PDF will likely take longer. The status transitions from in_progress to pending (confusingly) before finally reaching completed. The default 300s timeout in notebooklm-py is inadequate for real usage.

LLM agents frequently take shortcuts when given optional-sounding instructions. The original SKILL.md described the scripts but didn’t forcefully mandate their use. The agent (Gemini) interpreted “generates audio” as something it could do itself. The fix: put “CRITICAL” and “MANDATORY” directives at the top, provide complete copy-paste bash commands with env var setup, and remove any ambiguity about what the agent should do. This is a common pattern in agent skill design — you need to be explicit about what is non-negotiable.

The source venv/bin/activate approach is fragile because: (1) activate scripts have hardcoded VIRTUAL_ENV paths that break when the directory is copied, and (2) in non-interactive shells the activation may silently fail. Using the venv’s python binary directly (venv/bin/python3) bypasses all of this — it automatically uses the venv’s site-packages without needing activation.

NotebookLM outputs audio in M4A/MP4 format (AAC codec) despite the .mp3 download extension. This is worth noting for the generate script — it should either convert to MP3 via ffmpeg after download, or we could update the pipeline to serve M4A directly (most podcast apps support it). For now the ffmpeg conversion works and keeps RSS feed compatibility universal.

The plan layers an async job queue on top of a plugin that hasn’t been built yet. The implementation plan (Task 1-8) defines the base plugin, and the async push plan adds job_manager, worker, notifier, and systemd service on top. I’ll build everything together — the base scripts plus the async enqueue path.

The “duplicate attribute” error happens because ET.register_namespace("itunes", ITUNES_NS) makes ElementTree serialize the namespace as xmlns:itunes, but create_feed also manually sets xmlns:itunes as an attribute on the <rss> element. When serialized, both appear, producing a duplicate xmlns:itunes attribute. The fix: remove the manual attribute and let register_namespace handle it.

The existing codebase already had important production fixes from prior sessions: M4A audio format support, email field in RSS feeds, audio type detection, and an extended 900s timeout. My rewrite dropped these. I need to restore the committed versions and apply only the new async features on top, rather than replacing the files wholesale. This is a common pitfall when rebuilding from a plan document that doesn’t reflect incremental fixes.

  • The filesystem job queue pattern (write .tmp then os.rename) provides atomicity on POSIX systems — rename is atomic within the same filesystem, so a partially written file never appears as a valid job.
  • The worker imports generate and publish as Python functions rather than shelling out to the scripts. This avoids subprocess overhead and keeps everything in one process, though it means the worker needs all dependencies (notebooklm-py, azure-storage-blob) in its venv.
  • I preserved all prior production fixes (M4A audio detection, 900s timeout, ffprobe fallback for duration, source metadata title extraction, RSS email/image elements) that were committed in earlier sessions.

personal-finance

Worked on the spending detail view for the personal finance dashboard, adding per-category breakdowns under the spending section.

  • The SPENDING_DETAIL lines use a separate section identifier so buildFlow can differentiate them from group-level lines, but the renderer maps them back to SPENDING for display continuity (no extra section header).
  • Detail lines only show the actual column since there’s no per-category plan — the 12-space blank padding replaces the plan column.
  • Negative category amounts (like Auto -$18k) indicate refunds exceeded spending — useful for spotting insurance reimbursements or returns.