<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>Arsalan Mosenia</title><description>Notes on AI/ML engineering — LLMs, retrieval, evaluation, infrastructure, and agents.</description><link>https://mosenia.com/</link><language>en-us</language><item><title>Serving LLMs on a Budget: Notes From a Small Team</title><link>https://mosenia.com/posts/serving-llms-on-a-budget/</link><guid isPermaLink="true">https://mosenia.com/posts/serving-llms-on-a-budget/</guid><description>Practical patterns for running open-weight models in production when you don&apos;t have a research cluster behind you.</description><pubDate>Thu, 02 Apr 2026 00:00:00 GMT</pubDate><category>LLMs</category><category>Infra</category><author>arsalan@mosenia.com (Arsalan Mosenia)</author></item><item><title>Retrieval Is Mostly Data Work</title><link>https://mosenia.com/posts/retrieval-is-mostly-data-work/</link><guid isPermaLink="true">https://mosenia.com/posts/retrieval-is-mostly-data-work/</guid><description>After a year of RAG systems, the pattern is clear: the interesting problems are upstream of the vector database.</description><pubDate>Tue, 24 Mar 2026 00:00:00 GMT</pubDate><category>RAG</category><category>LLMs</category><author>arsalan@mosenia.com (Arsalan Mosenia)</author></item><item><title>Building an Eval Loop You Actually Trust</title><link>https://mosenia.com/posts/building-an-eval-loop-you-trust/</link><guid isPermaLink="true">https://mosenia.com/posts/building-an-eval-loop-you-trust/</guid><description>The fastest way to ship a bad model is to grade it with a worse one. A practical guide to evaluation that survives contact with production.</description><pubDate>Wed, 11 Mar 2026 00:00:00 GMT</pubDate><category>Evals</category><category>LLMs</category><category>Security</category><author>arsalan@mosenia.com (Arsalan Mosenia)</author></item><item><title>GPU Pools Without Kubernetes</title><link>https://mosenia.com/posts/gpu-pools-without-kubernetes/</link><guid isPermaLink="true">https://mosenia.com/posts/gpu-pools-without-kubernetes/</guid><description>You probably don&apos;t need a scheduler. You need a queue, a health check, and a way to drain a box gracefully.</description><pubDate>Fri, 27 Feb 2026 00:00:00 GMT</pubDate><category>Infra</category><category>LLMs</category><author>arsalan@mosenia.com (Arsalan Mosenia)</author></item><item><title>Agents Need Boring Tools</title><link>https://mosenia.com/posts/agents-need-boring-tools/</link><guid isPermaLink="true">https://mosenia.com/posts/agents-need-boring-tools/</guid><description>The interesting failure modes in agent systems come from tool surfaces that look fine to a human and confusing to a model.</description><pubDate>Sat, 14 Feb 2026 00:00:00 GMT</pubDate><category>Agents</category><category>Security</category><category>Enterprise AI</category><author>arsalan@mosenia.com (Arsalan Mosenia)</author></item><item><title>Latency Budgets for RAG Applications</title><link>https://mosenia.com/posts/latency-budgets-for-rag/</link><guid isPermaLink="true">https://mosenia.com/posts/latency-budgets-for-rag/</guid><description>If you haven&apos;t written the budget down, the model is deciding it for you. Here is the template I use.</description><pubDate>Fri, 30 Jan 2026 00:00:00 GMT</pubDate><category>RAG</category><category>Infra</category><author>arsalan@mosenia.com (Arsalan Mosenia)</author></item></channel></rss>