llm frameworks are abstractions over mostly stateless apis
like seemingly everyone in the industry of creating “something as a service”, i am also writing an autonomous agent based off of some llm with capabilities augmented by mcp services. very quickly, most of my colleagues hit the ground running, and to move more quickly, we all picked up some framework for creating an agent. strands and claude agent sdk were two popular choices. i was a little dubious at first, because it didn’t feel like we needed a framework. but we were able to get started quickly, and then quickly turn our focus to improving our mcp servers.
the problem: tracing through frameworks
i was playing around adding tracing to our agent because i thought it would be cool to have a little visibility into the whole life cycle of a single “turn” of an agent conversation, eg, when a user sends a message through a slackbot, being able to trace that entire interaction all the way down to llm calls and mcp calls, then back out to the user. and pretty quickly i had hit a few brick walls.
we used temporal, and since most of that workflow is in temporal, we had some visibility into how the activities were called. when i added tracing, because of both the temporal framework and the llm agent framework, i had limited access to instrument code. i got weird looking “1ms” spans and i was pretty constrained in the llm framework to using only pre/post hooks which was okay. and then there was also no way to propogate the trace id down to the mcp servers.
rolling my own agent
i spent a little time thinking, “well, what if i just rolled my own agent framework?” it didn’t seem to bad, since really it’s just a simple implementation of the “ReAct” pattern.
honestly, not too upset with how it came out. didn’t even really feel like that much more work to create than using strands or claude agent sdk. but the most surprising thing was actually interacting with the llm apis directly.
there is basically just one api call, something like:
POST /messages
{
"system": "prompt...",
"tools": [...],
messsages: [...],
}
stateless api realization
the thing that surprised me was that the apis are stateless—it was my responsibility as the caller to maintain the conversation history, tools, etc. and this got me thinking and asking stuff like, oh, if i have to provide tools with every request, why can’t i do fun things like filter tools down to the most probably that the llm might request? or also, why can’t i do most complex “conversation compaction” myself? why shouldn’t i? why couldn’t i just use a key-value store to maintain conversation history that is super easy to shard/scale?
why this changes things
it was pretty interesting all the different ideas i had. i also really didn’t see any good reason to not try these things. and then i felt a little confused, because using the frameworks, i felt we were pretty constrained—and people writing those are very, very smart. but i started to feel pretty good that writing my own framework for this wasn’t a terrible idea.
really, i also felt like even writing a general “agent as a service” that could manage conversations for anyone didn’t seem too far of a stretch either. i mean, everyone at work was basically creating the same exact thing, here i could do this very simply using a super basic crud api. but because we were all using the frameworks and not looking any deeper, it made all of our approaches feel the most natural. but i feel like we were all led astray. the stateless api was actually pretty liberating—and simple.
and so rolling my own “framework” was not so complicated, and in turn, it let me create a really powerful service. basically, i made a scalable service in place of the framework sdk most folks were using.
i don’t really have a good ending here. and dinner’s ready.