how temporal uses rust for deterministic workflow replay

i spent a good portion of my thursday writing a temporal SDK using just their protobuf in rust because there wasn’t an official one. turns out temporal does something else with rust that is really fascinating.

building from scratch with protobuf

temporal has a lot of official sdks in various languages, mainly go, python, and typescript. i didn’t see one in rust and saw brief notes about how an official rust sdk was not in their roadmap. but they had an official protobuf to talk to a temporal server, so i figured, lets start building a temporal client in rust.

very quickly, you realize that within workflows, determinism is critical and being able to replay past temporal events to produce the current state is a core part of how temporal works. this was an interesting exercise by itself, because i had a vague notion of what this meant when discussed, but it was never very clear. so, on the surface, i figured, we get events, let’s mutate some internal state, and so we’re building a relatively simple state machine.

async rust for workflow state machine

as i was thinking a bit about what kind of DSL or how i would construct a workflow that would be converted into a state machine, i immediately recalled how in rust, async code is compiled into a state machine. each await point is basically a yield point, and the compiler generates an enum with all the states and transitions for you. so i figured, if i can make a Future out of this “event replay”, i could just write workflows in rust.

so that was where i got started. the plan was each await in the workflow would be for some activity execution or some timer or something. we would look for that event in the history (in order), if it does not exist, then we would create that command to be sent to temporal server using some deterministic ID, so that if we ever come back and replay this, then we can see we’ve already started it. because then, if we see it has already been started, then the next step is to see if it was completed. and we look in the event history for a result event if it exists. this would be done by the activity worker process. so if there’s no event, then that means we’re still pending a result, and thus that’s what our future would return.

this pattern of create determinisitic ID, create start event, and wait for completion event was the basic idea of the future we created. i did this with timers pretty quickly, too. signals and queries were a little more difficult, but i had at least a “good enough” version of them working in my prototype. the main challenge was just remembering that workflow decisions are based off the order of events, and signals can influence.

what temporal does

long story short, turns out temporal already kinda does all that already. they actually base all of their sdks off of a rust “core” which does exactly what i’m doing, just obviously more refined. but they compile that and use FFI from go, python, and typescript, to do all their workflow processing with rust because of the same reasons i was doing it: you can use rust to create/manage your “state machine”. it’s definitely a really cool use of rust and managing polyglot set of libraries with a common functionality.

i wouldn’t say any of this was time wasted, though, because i learned a lot about temporal. basically their whole thing is they have a “workflow” server, where each workflow maintains a rigid history of events. but the contract is supposed to be that for any workflow worker, when it picks up an event for a workflow, it needs to be able to deterministically rebuild the state from the order of events. the durable storage and resiliency of these events is the primary secret sauce of temporal, which i guess that’s why some workflows can live for very long and be run off of any worker.

but a rust temporal sdk would be cool

but what was really cool in what i made was that since everything was a future, i could use things like futures::join! and futures::select! and everything still worked and felt really natural. i wonder why they never bothered to publish a rust sdk for temporal?