Bhavana AI

AI/ML insights

Dev Log: February 15, 2026

Nanny Timesheet

A massive build day. The entire Nanny Timesheet app went from implementation through deployment, covering auth, API routes, dark mode, UI consolidation, holiday management, email notifications, Docker containerization, and data import. Started the morning finishing code reviews on the core modules (Tasks 4-7), adapted the auth layer from next-auth v4 to v5 (Auth.js), wired up Azure Communication Services for email, switched from magic link to Google OAuth, built the admin UI with role toggling and holiday management, imported all historical data from the spreadsheet, fixed dark mode styling across every form, consolidated the calendar into the dashboard, added vacation request cancellation, and wrestled with Docker (Alpine + better-sqlite3 native addon compilation, SQLite + Azure Files POSIX locking incompatibility). Ended the day with automated screenshot capture for the blog post, and wrote the draft post itself.

Both reviews passed for Tasks 4-7:

  • Spec review: All 4 modules match their specifications exactly, with sensible adaptations (string-based APIs for timezone safety)
  • Code quality: Approved at 8.5/10. Only minor suggestions (JSDoc, exporting types) — no code changes needed since the logic is self-evident and types will be exported when consumed downstream

The plan was written for next-auth v4 API style (NextAuthOptions, import NextAuth from "next-auth"). With Next.js 16.1.6, the current version of next-auth is v5 (Auth.js), which has a different API surface. The implementer will need to adapt — similar to how we handled the Prisma v5 to v7 migration in Task 2.

The code quality reviewer flagged error handling and type assertions, but these are acceptable tradeoffs for this 3-user internal app:

  • Type assertion (as never): Necessary bridge between Prisma v7 custom output and PrismaAdapter’s expected types. Runtime shapes are identical.
  • Email error handling: NextAuth wraps the sendVerificationRequest call and handles errors at the framework level. Adding our own try-catch would be redundant.
  • Session null check: The if (dbUser) guard already handles the edge case. Users will be pre-seeded (Task 26), and the allowlist prevents unknown emails.
  • Spec review: All 5 API routes fully compliant. Zod validation, parseLocalDate(), admin guards, and atomic transactions all verified.
  • Code quality: Approved with minor suggestions. The reviewer noted inconsistent error responses (some routes return [], others return 404 for missing nanny) and a type assertion on computeBalance — both are acceptable for v1. The NOTIFY_EMAIL! non-null assertion is safe because the email functions are only called fire-and-forget with .catch(console.error), so a missing env var will just log an error rather than crash.

The entire Nanny Timesheet app is now built and pushed. Here’s what was accomplished:

  • 39 unit tests across 7 test files, all passing
  • 12 E2E test scenarios (Playwright, ready to run with npm run test:e2e)
  • 35 commits from design doc through full implementation

Azure Communication Services vs Resend API differences:

  • ACS uses EmailClient with a connection string (not an API key)
  • Sending is done via beginSend() which returns a poller (async operation) — Azure queues the email
  • The sender address must come from a provisioned ACS domain (either Azure-managed like DoNotReply@<guid>.azurecomm.net or your own custom domain)
  • All email sending is now centralized in one sendEmail() helper, making future changes (logging, retries) easy to add in one place
  • Azure-managed email domains come pre-verified with SPF/DKIM/DMARC, so they work immediately — no DNS configuration needed. The tradeoff is the sender domain is a GUID-based subdomain, which looks less professional than a custom domain but is perfectly fine for internal/family use.
  • ACS resources themselves live in the global region (not a specific Azure region like eastus), though the data location is set to UnitedStates for data residency compliance.
  • The beginSend() method on the ACS EmailClient is asynchronous — it returns a poller you can optionally await for delivery status, but for fire-and-forget notifications (our use case), not awaiting the result is fine.
  • Next.js 16 deprecated the middleware file convention in favor of proxy. The re-export pattern export { default } from "next-auth/middleware" is causing issues because Next.js 16 expects a direct function export.
  • The server started successfully despite the warning, but the auth middleware may not be functioning, meaning routes aren’t protected.
  • Azure-managed email domains use a GUID subdomain that looks suspicious to spam filters. For production, setting up a custom domain (e.g., nanny.yourdomain.com) with proper SPF/DKIM/DMARC records would improve deliverability significantly.
  • The beginSend() API is fire-and-forget by default — our sendEmail() function wasn’t calling pollUntilDone(), so errors would be silently swallowed. For auth emails, it might be worth polling to catch failures.
  • NextAuth’s PrismaAdapter expects specific fields on the User model: emailVerified (DateTime?) and image (String?). When using magic link auth, NextAuth sets emailVerified after the user clicks the link. Without this field, Prisma throws an “Unknown argument” error.
  • We also made name optional (String?) and gave role a default value (@default("nanny")) so that if NextAuth creates a user record automatically (for an email not in the seed data), it won’t fail on required fields.
  • Google OAuth with NextAuth uses the “database” session strategy, meaning after the first Google sign-in, NextAuth creates a User + Account record in SQLite. The signIn callback still enforces the email allowlist, so only your approved Google accounts can access the app.
  • The ALLOWED_EMAILS list now needs to contain the Google email addresses (Gmail or Google Workspace) that each person will sign in with, not necessarily iCloud addresses.
  • The role toggle is purely client-side (React context) — it switches which nav links and pages you see, but doesn’t change your actual database role. API routes still check session.user.role for admin-only operations like manual adjustments.
  • The AppShell component adds top padding (pt-8) when you’re an admin so the toggle banner doesn’t overlap page content.
  • OAuthAccountNotLinked is NextAuth’s safety guard: if a user record already exists (from seeding or a different auth provider), it won’t let a new OAuth provider claim that account by default. The allowDangerousEmailAccountLinking flag bypasses this, which is safe when you control the user list via an allowlist.
  • NextAuth’s withAuth middleware only works with JWT sessions because middleware runs on the Edge runtime, which can’t make database queries. Switching to JWT sessions and moving auth guards to the client component (AppShell) is the standard pattern for Next.js 16+, which deprecated the middleware file convention entirely.
  • The jwt callback runs when the token is created/updated, so we look up the user’s role from the database there and embed it in the JWT. The session callback then copies it from the token to the session object for client access.
  • The import uses all raw data from both Year 1 and Year 2 of the spreadsheet — 12 months of Year 1 accruals, 21 Year 1 usage entries, 4 months of Year 2 accruals, and 7 Year 2 usage entries. By importing all transactions rather than just a carryover amount, the ledger view will show complete history.
  • The balances endpoint returns 404 when there’s no nanny user (role: "nanny"). Creating Jessa as a nanny user (even with a placeholder email jessa@nanny.local) fixes this. The admin views query for the nanny user’s data automatically.
  • The accrual cron is designed to run on the 1st of each month and uses a unique constraint (accrual_idempotency) on [userId, type, category, date] to prevent duplicate accruals. This means you can safely re-run it without creating duplicates — it’ll just return already_exists.
  • The adjustment form defaults to today’s date but lets you backdate entries, which is useful for correcting missed accruals or recording historical adjustments.

The root cause: the app’s globals.css defines dark mode CSS variables (--foreground: #ededed) via prefers-color-scheme: dark, but form inputs use hardcoded light-mode colors like bg-white, border-gray-300, and sometimes text-gray-900. In dark mode, inherited light text on white input backgrounds becomes invisible.

  • color-scheme: dark in globals.css tells the browser to render native form controls (date pickers, number spinners) in dark mode. Without this, browser-rendered parts of <input type="date"> would still use light styling.
  • Tailwind’s dark: variant works with prefers-color-scheme by default in v4, so it aligns perfectly with the existing CSS variable setup in globals.css.
  • The consistent dark mode palette: bg-gray-950 (page bg), bg-gray-900 (cards), bg-gray-800 (inputs), border-gray-600 (borders), text-gray-100 (headings), text-gray-300 (labels), text-gray-400 (secondary text).

This is a classic “consolidation” refactor — merging two pages into one. The key challenge is combining the state management and API calls from both pages without duplicating logic. We’ll need to bring the calendar’s month navigation state and holiday API call into the dashboard page.

The calendar page fetches /api/requests, /api/extra-hours, and /api/holidays — the dashboard already fetches the first two. We just need to add /api/holidays to the existing Promise.all and bring in the calendar’s month navigation state + UI.

Running tsc --noEmit directly on a Next.js project often produces false positives from node_modules because Next.js uses its own TypeScript configuration during build. The next build output (which only failed on the pre-existing better-sqlite3 typing issue) is the reliable check here.

This plan involves a common pattern in admin UIs: pre-populating defaults that users can then customize. The key design choices here are:

  1. Client-side date computation for floating holidays (e.g., “4th Thursday of November”) keeps the logic simple and avoids needing a holiday calculation library on the server
  2. Inline editing with immediate PUT gives a smoother UX than modal dialogs or save buttons

The existing holidaySchema validates { name, date, cycleYear } for POST. For the PUT endpoint, we only need { id, date } since we’re updating the date on an existing record. Rather than creating a new Zod schema, a simple regex check on the date string keeps it lightweight since the id comes from the database.

Computing floating holidays with date math: Holidays like “1st Monday of September” or “last Monday of May” require a common pattern: iterate through dates in a month to find the nth occurrence of a specific weekday. The algorithm is:

  1. Start at the 1st of the month
  2. Advance to the first occurrence of the target weekday
  3. Add 7 days for each additional occurrence needed

For “last Monday of May,” work backwards from the end of the month instead.

Cycle year mapping: The cycle year (Sept Y to Aug Y+1) means Labor Day and Thanksgiving use year Y, while New Year’s, Memorial Day, and Independence Day use year Y+1. Christmas straddles — it’s in December of year Y.

Key implementation details worth noting:

  1. nthWeekdayOf math: The formula 1 + ((weekday - first.getDay() + 7) % 7) finds the first occurrence of a target weekday in a month. The +7 % 7 handles the wrap-around when the target weekday is earlier in the week than day 1. Then + 7 * (n-1) jumps to the nth occurrence.

  2. Optimistic UI update for date changes: updateHolidayDate fires the PUT and immediately updates local state on success. This avoids a full reload while keeping the UI responsive. If the PUT fails, the date input snaps back on the next render since state was never updated.

  3. Simplification of Add Holiday form: By removing the cycle year dropdown from the form and using holidayCycleYear (the viewer’s selected year), we avoid a confusing UX where adding a holiday to cycle year X while viewing cycle year Y would make the new holiday invisible.

This plan implements a soft-delete pattern for vacation requests. The key architectural decisions:

  1. Transactional deletion - Since the schema lacks cascade deletes, we must manually delete linked transactions before the request itself, wrapped in $transaction to maintain data consistency.
  2. Future-only guard - Server-side date check prevents canceling past requests, even if someone crafts a manual API call.
  3. Inline confirmation - Avoids modal overhead for a simple destructive action, keeping the UI lightweight.

The holidays DELETE handler (line 63-73) is our template. Key differences for requests:

  1. No admin-only guard — nannies should cancel their own requests, so we verify ownership instead of role.
  2. Date guard — we add a future-date check server-side to prevent canceling past time off.
  3. Transactional multi-table delete — unlike holidays (single table), we must delete linked Transaction records first since there’s no onDelete: Cascade in the Prisma schema.

Using prisma.$transaction([...]) with an array of operations (batch transaction) vs the interactive callback form used in POST. The batch form is simpler here because we don’t need intermediate results — just “delete transactions, then delete request” atomically. If either fails, both roll back.

The implementation has two layers of defense for the “future only” constraint:

  1. Client-side (isFutureOrToday) - controls UI visibility of the cancel button, preventing accidental clicks.
  2. Server-side (DELETE handler date check) - prevents API abuse. Even if someone crafts a DELETE request manually, the server rejects past-date cancellations.

Both use the same setHours(0,0,0,0) pattern to normalize to midnight for date-only comparison, avoiding time-of-day edge cases.

The getNotifyEmail() function queries the settings table for the admin-configured email on each notification. This is a lightweight query (primary key lookup) and ensures the email is always current. The NOTIFY_EMAIL env var now serves as a fallback if no setting has been configured in the UI.

The issue: emailClient.beginSend() returns a poller, not the final result. It starts the send operation but awaiting it only waits for the operation to be accepted, not completed. The email could be failing on Azure’s side after acceptance, and we’d never know because we’re not polling for the final status.

I kept the pollUntilDone() call even after removing the debug logs. The original code only called beginSend() without polling, which meant errors during delivery were silently lost. Now it actually waits for Azure to confirm delivery, so any failures will properly propagate to the .catch(console.error) in the route handler.

The postinstall hook in package.json runs prisma generate whenever npm ci is called. In the deps stage, only package.json and package-lock.json are copied (for layer caching), so the prisma schema isn’t available yet. The fix is to skip postinstall during npm ci with --ignore-scripts, since prisma generate runs explicitly in the builder stage anyway.

better-sqlite3 is a native Node.js addon — it compiles a C++ .node binary during npm install. Our --ignore-scripts flag skipped that compilation, and Next.js standalone output doesn’t bundle native addons. We need to: (1) install build tools on Alpine, (2) compile the native addon, and (3) copy it into the runner stage.

SQLite requires POSIX file locking, but Azure Files uses the SMB protocol which doesn’t support it. This is a fundamental incompatibility. The solution: copy the database to the container’s local filesystem at startup, work with it there (fast + proper locking), and sync it back to Azure Files periodically and on shutdown.

The bhavanaai blog has a very specific voice: narrative-driven, conversational, first-person, no bullet lists for core arguments, no em-dashes, minimal H3 headers. Existing posts range from data-heavy analysis (4,000-9,500 words) to shorter tool showcases. This project writeup will be narrative but more concise, matching the “where-should-i-live” pattern.

The blog post is marked draft: true because screenshots still need to be added. In Astro’s content collection setup here, draft posts are filtered out of the main listing (getCollection('blog', ({ data }) => !data.draft)) but still generate routes via getStaticPaths(). The Projects section links directly to the slug, so it will work even while the post is in draft.

The screencapture -l <CGWindowID> approach works great once Screen Recording permission is granted. The key was getting the CGWindowID via Python’s Quartz.CGWindowListCopyWindowInfo() rather than AppleScript’s window ID (which is a different number). Combined with osascript to navigate Chrome between pages, this gave us fully automated screenshot capture.


Short Projects

A few small efforts across the blog itself and some testing. Built a comments/feedback system for the blog using GitHub’s Contents API to persist comment data, with an absolute-positioning trick to float comments in the right margin on wide screens. Also explored adversarial reasoning with the “car wash test” post, and ran into environment issues with the Bash tool late in the day.

Key porting decisions from preview.html to the Astro blog:

  1. Layout strategy: Instead of restructuring the DOM with flexbox, the comments column is positioned absolutely with left: 100% on main, floating it in the right margin. This avoids touching the existing Astro layout.
  2. Dark mode: preview.html uses @media (prefers-color-scheme: dark), but the Astro blog uses a JS-toggled .dark class, so all dark mode selectors are converted.
  3. GitHub API sync: The preview.html saves to a local file. The blog version commits feedback/{slug}.json to the repo via GitHub Contents API, enabling cross-device commenting.

Why is:inline with external src works well here: Astro’s <script> tags (without is:inline) get processed, bundled, and deduplicated by the build pipeline. But is:inline tells Astro to emit the tag exactly as written. Since comments.js is a self-contained IIFE that self-guards (exits early if no PAT in localStorage), it behaves correctly as a standalone script: no module bundling needed, no side effects for non-admin users.

The absolute positioning trick for the comments column: Instead of restructuring the DOM with flexbox (which would affect the header and footer), the comments column is positioned absolute with left: 100% on the main element. This places it in the right margin of the viewport without touching the existing layout. On screens narrower than 75rem, the column is hidden and inline cards appear below each commented element instead.

Astro’s CSS scoping vs dynamic DOM: Astro scopes <style> blocks by default — it appends a unique [data-astro-cid-xxx] attribute selector to every CSS rule, and adds the matching attribute to all elements rendered by that component. But elements created at runtime via document.createElement() never get this attribute, so scoped CSS silently fails on them. This is why global.css needs is:global — its styles need to apply to both Astro-rendered and JS-created elements.

Why mouseup doesn’t work on iOS Safari: On mobile Safari, text selection is handled by the OS-level selection system, not standard mouse events. When you long-press and drag to select text, no mouseup fires. The selectionchange event on document is the correct cross-platform way to detect when a user has made a text selection. We use a debounce so the button only appears once the user has finished adjusting their selection handles.

This is a fascinating adversarial reasoning test — it’s essentially a “common sense override” problem. The model needs to resist the statistical prior (most “walk vs drive” questions on the internet genuinely are about choosing walking) and instead apply causal reasoning: the car IS the object being serviced, so it must go to the car wash. This tests whether RL-trained reasoning can override pre-training biases.

macOS screenshots often contain Unicode narrow no-break spaces (U+202F) before “AM”/“PM” in timestamps. These look identical to regular spaces but break string-based file operations. Using glob patterns (Screenshot*3.14.47*) instead of exact filenames is the reliable way to handle this.

The Bash tool failure pattern (every command returning exit code 1 with no output) is likely a sandbox or hook configuration issue in your environment, not a code problem. It started working earlier in the session and then stopped. Restarting the Claude Code session or checking your hooks config (/hooks) would likely fix it.


Courses

Built a graph digitizer tool for extracting data from research paper figures, reorganized a course project repository, and re-traced Gavel/FGD experiment data from paper figures. The digitizer supports calibration, drag-to-adjust points, import/export of JSON data, and projection guide lines. The repository reorganization moved files with git mv in logical batches with test verification at each step. The re-traced data revealed that the old OCR values had significant errors (up to 5.5x off), and the corrected values actually show much closer alignment between our replication results and the original paper.

Before reorganizing a project directory, it’s critical to first understand the full current state — what exists, what depends on what, and what’s actually used vs. abandoned. This prevents accidentally breaking import paths or deleting runtime dependencies.

The fgd_placement.py uses sys.path.insert(0, ../../fgd_src) to find fgd.py. This relative path hack means any move of fgd_src/ requires updating this single line. This is a common pattern in research repos where code grew organically rather than being packaged properly.

The plan is structured so each task produces a valid, committable state. This is critical for a large reorganization — if something goes wrong mid-way, you can bisect to find which move broke things. The ordering matters too: deletions first (Task 1) so moves don’t carry dead files, then src/ moves (Tasks 2-3) before experiment moves (Tasks 4-5), since experiments depend on src/.

For a file reorganization (as opposed to feature implementation), the normal per-task subagent + two-stage review cycle is overkill. The “implementation” is git mv and git rm commands, and the “review” is running tests to confirm nothing broke. I’ll batch aggressively: all structural changes in one subagent, then docs + verification in parallel.

The key to a safe large-scale reorganization: (1) use git mv so git tracks file identity across moves, (2) commit after each logical batch so you can bisect if something breaks, (3) run the full test suite at the end to catch any broken import paths. The one surprise here was test_simulator.py importing from simulator which had moved to a different directory — cross-boundary imports are the #1 thing that breaks in reorganizations.

This is a data digitization task - extracting numerical values from a graph image by clicking on points. This is a common research workflow when paper data isn’t available in machine-readable form. A lightweight web tool with click-to-coordinate mapping is the right approach.

Graph digitization is essential in research when papers don’t publish raw data. The key challenge is mapping pixel coordinates to data coordinates through calibration — clicking two known reference points to establish a linear transform between screen space and data space.

Calibration is the key step. The tool maps pixel coordinates to data coordinates using two reference points (origin + extent). By clicking the exact bottom-left and top-right corners of the plot axes, you establish a linear transform: dataX = xMin + (pixelX - originX) / (extentX - originX) * (xMax - xMin). The Y axis inverts naturally because pixel Y increases downward while data Y increases upward.

The drag system uses a mousedown/mousemove/mouseup pattern instead of a simple click handler. The key trick: on mousedown near a point, we enter drag mode. On mouseup, if we didn’t move (moved: false), it was just a click on the point (no-op). If we mouseup on empty space without dragging, it adds a new point. This prevents accidental point creation when adjusting existing ones.

Import requires data2px — the reverse of px2data. Given calibration points (origin at xMin,yMin and extent at xMax,yMax), the reverse mapping is: pixelX = origin.px + (dataX - xMin)/(xMax - xMin) * (extent.px - origin.px). Y naturally inverts because origin (bottom-left) has a larger pixel-Y than extent (top-right).

Two different guide modes based on context. During calibration (before the plot area is defined), full crosshair lines span the entire canvas — this helps you align with axis tick marks on the image. After calibration, the guides switch to projection lines that drop from your cursor to the X and Y axes of the calibrated plot area, with labeled value badges at each axis. This mimics how you’d read a graph by hand — follow the point down to the X axis and left to the Y axis.

Both JSON files use a nested structure — Gavel nests under fig9/fig10/fig11, FGD nests under fig7a_frag_rate_pct/fig7b_frag_over_total_pct/etc. The import now uses the selected figure’s jsonKey to drill into the right sub-object before extracting series data. This way you can load the full multi-figure JSON and it pulls only the data for the figure you selected.

The key bug was a duplicate const fig declaration in importJSON() — JavaScript treats this as a SyntaxError at parse time, which prevents the entire <script> block from executing. Unlike runtime errors which only affect the failing function, parse errors kill all code in the script tag, including unrelated event handlers like the drop-zone click listener.

The traced data differs significantly from the old OCR values. For example, in Fig 9a the old data had gavel JCT at 4-5 hours at low load, while the traced values show ~22 hours. The x-axis ranges also changed — Fig 9 now ends at 6.5 jph instead of 8.0, and Fig 11 has finer-grained x-points (2.5, 2.9, 3.1, 3.3) not present in the old data. This illustrates why manual tracing is more reliable than automated OCR for graph digitization.

The difference between OCR/guessed values and hand-traced values is dramatic here. The old Fig 9 data had Gavel JCT at 4 hours at low load, while the traced value is 22 hours — a 5.5x error. This is why the graph digitizer tool was needed: automated OCR of plotted curves is notoriously unreliable, especially when lines overlap or the resolution is low.

The comparison table reveals something interesting: the traced paper curves now align much better with our replication results in terms of shape and relative ordering. For example, in Fig 10 our Gavel at 2.0 jph is 15.6h vs the paper’s 18h (close), and our baseline at 2.0 jph is 20.2h vs paper’s 25h (reasonable). The old OCR data had the paper Gavel at 9h there — which was clearly wrong and made our replication look much worse than it actually is.