Bhavana AI

AI/ML insights

Dev Log: January 21, 2026

Claude Code Transcripts

A massive build day for the episodic memory search system. Started by revising the TDD plan to cover every design decision with explicit tests, including configuration validation, trigram tokenization for substring matching, FTS sync triggers, query sanitization, and the RRF fusion formula. Then executed through the phases one by one: built the full database layer (SQLite with WAL mode, FTS5 trigram, CRUD operations, and automatic sync triggers), implemented change detection with a two-tier mtime-then-hash strategy to keep incremental updates fast, and completed the text chunker with smart boundary detection and overlap for context continuity.

The afternoon was all about hybrid search. Built an embedding client that communicates with the MLX sidecar over Unix sockets, supporting batch embeddings and graceful null returns when the server is unavailable. Integrated sqlite-vec for vector similarity search using cosine distance, with filters applied inside the SQL query to avoid rank distortion. Wired it all together with Reciprocal Rank Fusion (k=60) to merge vector and FTS results, with automatic FTS-only fallback when embeddings are unavailable. Finished with a server enhancement layer using Cheerio to inject CSS and JS into the HTML viewer without modifying the upstream codebase.

Background indexing completed successfully, processing 3,471 conversations into 596,168 searchable chunks. The non-blocking design ensures the server stays responsive during indexing.

The updated TDD plan now covers every design decision with explicit tests. Key improvements:

  1. Configuration tests - validate env vars, paths, embedding model tracking
  2. Trigram tokenizer - tested for substring matching (e.g., “auth” finds “authentication”)
  3. Sync triggers - explicit tests for INSERT/UPDATE/DELETE propagation to FTS
  4. Query sanitization - tests for FTS5 special chars (AND, OR, NOT, quotes, parens)
  5. RRF formula - documented as k=60 with implementation code snippet

Phase 2 Complete! We’ve built the database layer:

  • SQLite schema with WAL mode, foreign keys, FTS5 trigram
  • Automatic FTS sync via INSERT/UPDATE/DELETE triggers
  • Full CRUD for metadata, conversations, and chunks
  • Query sanitization to prevent FTS5 syntax errors

Now Phase 3 focuses on change detection - detecting which source files need reindexing.

Phase 3 Complete! Change detection uses a two-tier strategy:

  1. Quick mtime check - Skip files with unchanged timestamps
  2. Hash verification - Only hash when mtime differs

This avoids expensive SHA256 hashing for unchanged files, making incremental updates fast.

Phase 5 Complete! The text chunker:

  • Uses smart boundaries (paragraph > code fence > line > sentence > word)
  • Includes overlap for context continuity between chunks
  • Keeps small code blocks intact
  • Uses ~4 chars/token approximation

Now Phase 6 builds the Search API layer.

Embedding Client Design:

  1. Uses Unix socket for low-latency communication with MLX sidecar
  2. Includes health checks to detect when embedding server is unavailable
  3. Supports batch embeddings for efficiency (multiple texts in one request)
  4. Returns null when unavailable instead of throwing - enables graceful fallback

sqlite-vec Integration:

  1. sqlite-vec is a SQLite extension for vector similarity search
  2. Uses vec0 virtual tables with float arrays
  3. Provides vec_distance_cosine() for similarity calculations
  4. Needs to be loaded as an extension, not built into better-sqlite3

Vector Search Design:

  1. Uses vec_distance_cosine() for similarity - lower distance = more similar
  2. Filters (project, role, dates) are applied INSIDE the SQL query to avoid rank distortion
  3. Overfetches (100 results) for RRF merge with FTS results

Reciprocal Rank Fusion (RRF):

  1. Formula: score(doc) = Σ 1/(k + rank) for each result set
  2. k=60 is standard constant that prevents top results from dominating
  3. Documents appearing in both sets get combined scores (boosted)
  4. Rank-based, not score-based - normalizes across different scoring systems

Hybrid Search Flow:

  1. Get query embedding from embedding client
  2. If embedding available: run vector search + FTS search, merge with RRF
  3. If embedding unavailable: fallback to FTS-only (with warning in response)
  4. Empty query: return recent conversations (existing behavior)

What was built:

  1. Embedding Client (src/embeddings/client.ts) - Unix socket client for qwen3-embeddings-mlx sidecar with health checks, batch support, and graceful null returns
  2. Vector Table (chunks_vec) - sqlite-vec virtual table for 2048-dimension vectors
  3. Sync Triggers - Auto-sync to both chunks_vec AND chunks_fts on INSERT/UPDATE/DELETE
  4. Vector Search - Cosine similarity search with integrated filters (project, role, dates)
  5. RRF Merge - Reciprocal Rank Fusion combining vector + FTS results
  6. Hybrid Search - Full hybrid flow with FTS fallback when embeddings unavailable

The enhanceHtml function uses Cheerio (jQuery-like HTML manipulation) to inject CSS and JS into existing HTML pages. This is a clean pattern for adding functionality without modifying the original HTML generator - keeps separation of concerns between the static HTML generator and the dynamic server enhancements.

The background indexing completed successfully, processing 3,471 conversations into 596,168 searchable chunks. The non-blocking design using setTimeout after server startup ensures the server is immediately responsive while indexing happens asynchronously. The status endpoint provides real-time visibility into indexing progress.


Courses: Stanford CS244C

Two threads today. First, refined the research framing for the Gavel project after professor feedback. The key shift: instead of “evaluating on different traces” (flagged as uninteresting), the research question now centers on a testable hypothesis that Gavel’s JCT degrades due to fragmentation on GPU-sharing workloads. The three-phase evaluation structure ensures replication first, then cross-validation to demonstrate the problem exists, then the actual contribution.

Second, studied the distributed lock recipe using ZooKeeper’s sequential ephemeral nodes. The “watch only your predecessor” pattern transforms O(n) wake-ups on lock release into O(1), since each waiter only watches the node directly ahead of it. The SEQUENTIAL flag provides a total order without additional coordination.

The design document captures the key shift in framing: instead of “evaluating on different traces” (which the professor flagged as uninteresting), the research question now centers on a testable hypothesis - that Gavel’s JCT degrades due to fragmentation on GPU-sharing workloads. The three-phase evaluation structure ensures you:

  1. First prove your simulators work (replication)
  2. Then demonstrate the problem exists (cross-validation)
  3. Finally show your solution helps (contribution)

This “watch only your predecessor” pattern appears everywhere in distributed systems. It transforms O(n) wake-ups into O(1). The SEQUENTIAL flag is what makes this possible - it gives you a total order without any additional coordination.


Bhavanaai (Personal Site)

Set up the personal site from scratch today. Scaffolded an Astro project with Tailwind v4 (using the new @tailwindcss/vite plugin instead of PostCSS), content collections with Zod schema validation for type-safe frontmatter, and a dark mode toggle that runs as a blocking inline script to prevent theme flash. Deployed to GitHub Pages with a two-job workflow (build + deploy), configured the custom domain me.bhavanaai.com via CNAME, and removed the /bhavanaai/ base path since custom domains serve at root.

This appears to be a React + TypeScript project with:

  • Vite as the build tool (fast HMR and bundling)
  • Tailwind CSS for styling
  • Supabase for backend/database
  • ESLint for code linting

Astro + Tailwind v4: The integration automatically uses the new @tailwindcss/vite plugin (Tailwind v4 approach) instead of the older PostCSS-based setup. This is faster and simpler.

Astro Content Collections: The defineCollection with Zod schema provides type-safe frontmatter validation. If a blog post has an invalid date or missing title, Astro will error at build time rather than silently failing.

Dark mode strategy: The is:inline script runs before page render (blocking), checking localStorage first, then system preference. This prevents the “flash of wrong theme” that happens with client-side-only approaches.

Tailwind v4 plugins: Unlike v3 which used tailwind.config.js, v4 loads plugins via CSS with @plugin "@tailwindcss/typography". This keeps configuration in one place.

The workflow has two jobs: build (18s) compiles Astro to static HTML, deploy (9s) pushes to GitHub Pages. Future deploys will be similarly fast since Astro generates lightweight static files.

GitHub Pages base path: When deploying to username.github.io/repo-name, all internal links must include the repo name. Astro’s import.meta.env.BASE_URL (set via base in config) handles this automatically. Always use it for internal navigation.

CNAME vs A record: For subdomains like me.bhavanaai.com, use CNAME pointing to username.github.io. For apex domains (bhavanaai.com), you’d need A records pointing to GitHub’s IPs. CNAME is simpler and follows GitHub’s IP changes automatically.

With a custom domain, you no longer need the /bhavanaai/ base path. GitHub Pages serves the site at the root of your domain, making URLs cleaner: me.bhavanaai.com/about instead of varunr89.github.io/bhavanaai/about.


Tools

Explored the Duffel flights API for multi-city trip search. The test mode returns placeholder airlines (“Duffel Airways”) but the structure works: round-trip searches split total price evenly between legs as an approximation, and all top results found nonstop flights meeting the max 1-stop constraint.

  • Duffel API returns “Duffel Airways” for some test flights - in production mode this would show real airline names
  • The round-trip search splits total price evenly between legs (an approximation) - actual booking would use bundled pricing
  • All top results found nonstop flights on all legs meeting the max 1-stop constraint