Arsalan Mosenia
Abstract warm taupe gradient
Infra · LLMs / 6 min read

GPU Pools Without Kubernetes

By Arsalan Mosenia · Published
0 claps

The reflex when infrastructure starts to feel complicated is to reach for a scheduler. For small GPU fleets this is often the wrong move — it trades a tractable operational problem for an opaque control plane that now owns your uptime.

For workloads that are long-running inference rather than short batch jobs, a straightforward pool pattern — a queue, a pool of worker boxes, a shared health check, and an ability to mark a box as draining — carries you a lot further than people expect.

This post covers the version of that pattern I keep reaching for, and the specific failure modes that eventually push you toward something heavier.

More posts

Get the newsletter

New AI/ML engineering notes in your inbox when I publish. No spam.

Powered by Buttondown. Unsubscribe any time.