Dev Log: January 13, 2026
courses
Set up a new git repo for course work using the GitHub CLI, dealing with nested git repo cleanup along the way. Then dove into GPU cluster scheduling research, reading papers on FGD (fractional GPU allocation and defragmentation) and Gavel (heterogeneous GPU scheduling), comparing how they solve complementary problems in real cluster environments.
gh repo create --source=.combines three steps: creates the remote repo, adds it as origin, and pushes - much faster than doing each manually- Nested git repos are a common gotcha - git treats them as submodules by default, which can cause confusion. Flattening (removing inner
.git) is simplest when the nested repo doesn’t need independent versioning
Gavel vs FGD: These papers solve complementary problems:
- FGD (the paper we just read): Handles partial GPU allocation (0.3 GPU, 0.7 GPU) and the fragmentation it causes
- Gavel: Handles heterogeneous GPU types (V100 vs K80) and how to fairly/efficiently allocate across them
In a real cluster with both problems, you’d want elements of both solutions.
Effective throughput is the key abstraction. It collapses the complexity of heterogeneous allocations into a single number per job. This lets Gavel reuse the structure of existing policies (fairness = max-min, makespan = min-max duration) while making them heterogeneity-aware. The insight is that most policies are already functions of throughput—Gavel just generalizes what “throughput” means.