Tool design, planning loops, and the unsexy work of making agents reliable.
1 post
The interesting failure modes in agent systems come from tool surfaces that look fine to a human and confusing to a model.