Topic
LLMs
Notes on serving, scaling, and shaping language models in production.
4 posts
LLMs Infra
Serving LLMs on a Budget: Notes From a Small Team
Practical patterns for running open-weight models in production when you don't have a research cluster behind you.
• 7 min
RAG LLMs
Retrieval Is Mostly Data Work
After a year of RAG systems, the pattern is clear: the interesting problems are upstream of the vector database.
• 9 min
Evals LLMs Security
Building an Eval Loop You Actually Trust
The fastest way to ship a bad model is to grade it with a worse one. A practical guide to evaluation that survives contact with production.
• 11 min
Infra LLMs
GPU Pools Without Kubernetes
You probably don't need a scheduler. You need a queue, a health check, and a way to drain a box gracefully.
• 6 min