LLMs

Serving LLMs on a Budget: Notes From a Small Team

Practical patterns for running open-weight models in production when you don't have a research cluster behind you.

7 min
RAG

Retrieval Is Mostly Data Work

After a year of RAG systems, the pattern is clear: the interesting problems are upstream of the vector database.

9 min
Evals

Building an Eval Loop You Actually Trust

The fastest way to ship a bad model is to grade it with a worse one. A practical guide to evaluation that survives contact with production.

11 min
Infra

GPU Pools Without Kubernetes

You probably don't need a scheduler. You need a queue, a health check, and a way to drain a box gracefully.

6 min
Agents

Agents Need Boring Tools

The interesting failure modes in agent systems come from tool surfaces that look fine to a human and confusing to a model.

8 min
RAG

Latency Budgets for RAG Applications

If you haven't written the budget down, the model is deciding it for you. Here is the template I use.

5 min