LLMs
Serving LLMs on a Budget: Notes From a Small Team
Practical patterns for running open-weight models in production when you don't have a research cluster behind you.
• 7 min
RAG
Retrieval Is Mostly Data Work
After a year of RAG systems, the pattern is clear: the interesting problems are upstream of the vector database.
• 9 min
Evals
Building an Eval Loop You Actually Trust
The fastest way to ship a bad model is to grade it with a worse one. A practical guide to evaluation that survives contact with production.
• 11 min
Infra
GPU Pools Without Kubernetes
You probably don't need a scheduler. You need a queue, a health check, and a way to drain a box gracefully.
• 6 min
Agents
Agents Need Boring Tools
The interesting failure modes in agent systems come from tool surfaces that look fine to a human and confusing to a model.
• 8 min
RAG
Latency Budgets for RAG Applications
If you haven't written the budget down, the model is deciding it for you. Here is the template I use.
• 5 min