AWS announced Lambda durable functions today in a blog post titled "Build multi-step applications and AI workflows with AWS Lambda durable functions" [0]
The distributed state associated with workflow systems like this makes observability and code upgrades a challenge compared to just having an application orchestrating workflows with progress persisted in a shared database. But if they solve those problems this would be really nice, as I've seen a ton of home-grown systems re-write checkpointing, retries, etc. and usually get it wrong.
Sounds like a step in the right direction. I would like to see an all-up dashboard of everything in the shared state, and good control over upgrades (maybe a mode where in-progress functions can complete on version 1, even if new functions are getting kicked off on version 2, etc.)
On price, you could definitely do better than $8/million steps, $0.25/GB written and $0.15/GB-month for state storage, but if you were designing something generic on S3/DynamoDB (state + status) to support all use cases at all scales, you'd probably end up spending something around the same order or magnitude.
But if you did that, you'd also have to implement it all yourself. This is a relatively simple checkpointing workflow orchestrator across standard Lambda functions, but with some really nice touch surfaces in the Lambda API itself.
What's only a footnote in the announcement is that this is only us-east-2 (Ohio) and TypeScript/JS + Python at the moment. Basically a public preview release. I look forward to seeing where they take this.
This post seems to be published in a hurry. Under "How it works" section a bunch of duplication, and I think they should make the blog post exactly once :) Excerpt from the blog post:
> During replay, your code runs from the beginning but skips over completed checkpoints, using stored results instead of re-executing completed operations. This replay mechanism ensures consistency while enabling long-running executions.
>
> ... During replay, your code runs from the beginning but skips over completed checkpoints, using stored results instead of re-executing completed operations. This replay mechanism ensures consistency while enabling long-running executions.
This is really exciting. Step functions were a big improvement over SWF and the Flow framework, but declarative workflow authoring sucks from a type-safety standpoint. Workflows-as-code is the way to go, and that was missing from AWS. Can't wait to build on top of this.
[0] https://aws.amazon.com/blogs/aws/build-multi-step-applicatio...
reply