Topic
Infra
Practical infrastructure for small AI teams: serving, queueing, observability.
3 posts
LLMs Infra
Serving LLMs on a Budget: Notes From a Small Team
Practical patterns for running open-weight models in production when you don't have a research cluster behind you.
• 7 min
Infra LLMs
GPU Pools Without Kubernetes
You probably don't need a scheduler. You need a queue, a health check, and a way to drain a box gracefully.
• 6 min
RAG Infra
Latency Budgets for RAG Applications
If you haven't written the budget down, the model is deciding it for you. Here is the template I use.
• 5 min