I bought the book, I read the book, I've used DynamoDB for awhile. It didn't change my mind. DynamoDB makes tradeoffs in order to run at massive scale, but scale isn't a problem many people need solving when 2TB of RAM fits in a single box. Meanwhile I need to handle eventual consistency, an analytics pipeline, another database for fuzzy search, another geo lookup database, Lambda functions to do aggregations, and a pile of custom code. All while giving up tooling so readily available for the RDBMS world.
In a world where Opex is much higher than Capex DynamoDB might make sense, but for me server costs are 5% of dev costs. And even if it works from a cost perspective, how many AWS services have the console experience ruined by DynamoDB? The UI tricks you into thinking its a data table with sortable columns, but no! DynamoDB limitations strike again and you are off on a journey of endless paging. The cost savings come at the expense of the user.
DynamoDB also isn't fast. 20ms for a query isn't fast, 30ms for an insert isn't fast. Yes its amazingly consistent and faster than other systems holding 500TB, but that isn't a use case for many users.
Others are comparing DynamoDB to Redis and Cassandra. It has additional limitations. These are fairly clearly spelled out but maybe weren't highlighted as prominently a few years back. (I say that because I inherited an application that made heavy use of DynamoDB but turned out not to be a great fit for DDB.)
- It provides rich types with some odd limitations: strings, sets, lists, and binaries do not allow empty values.
- You can store a maximum 400 KB data in one row.
- You can get a maximum of 1 MB data returned in a single query.
So it's mostly good for high-data-throughput applications, and then only if your high data throughput consists of large numbers of small records, processed a few at a time. This surely describes an important class of workloads. You may suffer if your workload isn't in this class.
Another annoyance is that (in my experience) one of the most common errors you will encounter is ProvisionedThroughputExceededException, when your workload changes faster than the auto-scaling. Until last year you couldn't test this scenario offline with the DynamoDB Local service because DynamoDB Local didn't implement capacity limits.
> - It provides rich types with some odd limitations: strings, sets, lists, and binaries do not allow empty values.
That is _infuriating_
It's documented, but it is so surprising when you first hit it. Sometimes, empty values have semantics attached to them, I don't want to scrub them out.
I think that basically describes DDB in a nutshell. You think it’s like Mongo, only on AWS, but then you slowly find out it’s much, much worse (for most use cases that you’d go with Mongo).
Note that Cassandra has similar limitations with data/throughput, but they aren't enforced or documented (because they depend on your particular setup) and your queries just fail or worse make all queries to the same node in the cluster fail (fun times with large wide rows).
The rich data types in Dynamo are quite strange, since they're basically useless for querying I'm not sure why you would use them. Maybe I'm missing something...
The rich data types can be useful for filtering, I guess? When you are running a query against a certain hash key and want the first record (sorted by range key) that meets a condition placed on a nested property of a map or whose "tags" property is a string set containing a certain member, for example.
If you treat DynamoDB as a DBMS, you’re going to be disappointed (for the reasons you mention). But if you think of it as a highly-durable immediately-consistent btree in the cloud, it’s amazing. DynamoDB is closer to Redis than MySQL. Amazon does it a disservice by putting it in the databases category.
Its not just that it is put in the database category, but that its champions at AWS make statements like "if you are utilising RDBMS you are living in the past", or that "there are very few use cases to choose Postgres over DynamoDB".
DynamoDb is like redis without the fun data structures, the fantastic cli and discoverability, the usefull configurable tradeoff between fast and consistent, and really much-needed features s.a listing your keys.
Daniel, I'm a big fan of yours but disagree with this take :).
It's definitely a database. The modeling principles are different, and you won't get some of the niceties you get with a RDBMS, but it still allows for flexible querying and more.
S3 and DDB are incredibly similar. Their fundamental operators are the same: key-value get/put and ordered list, and their consistency is roughly the same.
What differentiates DDB and S3 the most is cost and performance.
They're both highly-durable primitive data structures in the cloud, with a few extra features attached.
No, I meant a DBMS. Almost all the things you'd expect in a DBMS are not there. Take a look at most of the comments in this thread. Everyone expecting DBMS things, and complaining that they're not there.
Fair enough! I think that's a reasonable position.
IMO, there are two times you should absolutely default to DynamoDB:
- Very high scale workloads, due to its scaling characteristics
- Workloads w/ serverless compute (aka Lambda) due to how well it fits with the connection model, provisioning model, etc.
You can use DynamoDB for almost all OLTP workloads, but outside of those two categories, I won't fault you for choosing an RDBMS.
Agree that DynamoDB isn't _blazing_ fast. It's more that it's extremely consistent. You're going to get ~10 millisecond response times when you have 1GB of data or when you have 10 TB of data, and that's pretty attractive.
True! I'm not a huge fan of Aurora Serverless and the Data API. The scaling for Aurora Serverless is slow enough that it's not really serverless, IMO. And the Data API adds a good bit of latency and has a non-standard request & response format, so it's hard to use with existing libraries. But it's definitely an option for those that want Lambda + RDBMS.
The RDS Proxy is _hopefully_ a better option in this regard but still early.
Differing opinion - I think RDS Proxy is the wrong approach. Adding an additional fixed cost service to enable lambda seems like an indicator of a bad architecture. In this case the better approach would likely be to just use a Fargate container which would have a similar cost and fewer moving parts.
By the time you pay a fixed cost for the proxy on top of what you already pay for the RDS server, it'd be a far simpler architecture with less moving parts to just run a Fargate container (or better yet, AWS would offer a Google Cloud Run competitor)
The Data API, while still rough around the edges, at least keeps the solution more "serverless-y". Over time it should get easier to work with as tooling improves. At the very least, it won't be more difficult to work with than DynamoDB was initially with it's different paradigm.
For services that truly require consistently low latency, lambda shouldn't be used anyway, so the added latency of the data api shouldn't be a big deal IMO.
For those reasons, I view the RDS Proxy as an ugly stopgap that enables poor architecture, whereas the Data API actually enables something new, and potentially better. So I'd much rather AWS double down on it and quickly add some improvements.
I agree completely. We have APIs that are both used by our website and our external customers (we sell our API for our customers to integrate with their websites and mobile apps) and for batch loads for internal use.
We deploy our APIs to Fargate for low, predictable latency for our customers and to Lambda [1] which handles scaling up like crazy and scaling down to 0 for internal use but where latency isn’t a concern.
Our pipeline deploys to both.
[1] As far as being “locked into lambda”, that’s not a concern. With API Gateway “proxy integration” you just add three or four lines of code to your Node/Express, C#/WebAPI, Python/Flask code and you can deploy your code as is to lambda. It’s just a separate entry point.
Yup, that's exactly how I recommend clients to write lambdas for API purposes... Such a great balance of getting per request pricing while retaining all existing tooling for building APIs
For an internal API with one or two endpoints, I‘ll do things the native Lambda way. Your standard frameworks are heavy when all you need to do is respond to one or two events and you can do your own routing, use APIGW and API Key for authorization, etc.
There is also a threshold between “A group of developers will be developing this API and type safety would be nice so let’s use C#” and “I can write and debug this entire 20-50 line thing in the web console in Python, configure it using the GUI, and export a SAM CloudFormation template for our deployment pipeline.”
> By the time you pay a fixed cost for the proxy on top of what you already pay for the RDS server, it'd be a far simpler architecture with less moving parts to just run a Fargate container
A lot of people want to use lambda (or serverless) even so. So AWS is just accommodating their wishes.
We can’t use Aurora Serverless even in our non Prod environments because we have workflows that involving importing and exporting data to and from S3. But really, our Aurora servers in those environments are so small that most of our costs are storage.
Not to mention the same also applies for load. You get about 10ms at 10, 1000 or 1000000 requests per second, again irrespective of how much data you have.
> Workloads w/ serverless compute (aka Lambda) due to how well it fits with the connection model, provisioning model, etc.
This is only true for AWS. Azure functions share resources and don't have this issue.
The speed is actually quite sad. Its 5-10x slower than my other databases at p95, and I can't throw money at the problem on the write side. Reads I can use DAX, but then there goes consistency.
Good point! I would usually not recommend using a database from a different cloud provider just because of different hassles around permissions, connections, etc.
I've never found the speed an issue, but YMMV. To me, the best thing is that you won't see speed degradation as you scale. With a relational database, your joins will get slower and slower as the size of your database grows. With DynamoDB, it's basically the same at 1GB as it is at 10TB.
RDS maxes out RAM at 768GiB, if we're comparing managed to managed.
If you're approaching that point, you already are going to need an analytics pipeline, a search DB, etc, because maintaining ever growing indices will kill your latency. You probably can get away with aggregations for a bit longer, but if the number of rows you aggregate is growing too, eventually you will need to come up with something and the way you do that with Dynamo off a stream isn't a bad way to go about it with MySql either.
Looking at the tables I have access to, they all come under 5ms for both read/write. This is the same ballpark as our MySql apps for similar style queries (i.e. not aggegrations).
Sadly my favorite reason to use Dynamo is political, not technical. Since it somehow is not classified as a database at my company, the DBAs don't 'own' it. So I don't have to wait 2-3 months for them to manually configure something.
RDS goes 4TB on X1e instance type. But the point is RDBMS systems handle a large amount of data and workload types before needing to reach for specialist systems
I don't know how you are doing write transactions in 5ms on DynamoDB. Single puts p50 maybe, but i've never seen p90 put operations below 10ms.
>but scale isn't a problem many people need solving when 2TB of RAM fits in a single box.
What's the price of that on the cloud? I know I can run crazy big tables on DynamoDB for a couple of dollars. I don't know what 1 month of a relational database with 2TB of RAM costs on the cloud, but I am pretty sure I can't afford it.
Very disappointed to find the top comment is about DynamoDB and not Alex and his wonderful book. I suppose this is par for the course with HN. I hope nothing I create ever ends up posted here.
You are disappointed the comment is about the subject of the book from someone who read it and didn’t find it a compelling read? I didn’t like the book because it continued a trend in mia-selling DynamoDB and my comment reflects that frustration. Sorry but not every review is going to be glowing, and I’m certainly not going to make personal comments about the author.
In a world where Opex is much higher than Capex DynamoDB might make sense, but for me server costs are 5% of dev costs. And even if it works from a cost perspective, how many AWS services have the console experience ruined by DynamoDB? The UI tricks you into thinking its a data table with sortable columns, but no! DynamoDB limitations strike again and you are off on a journey of endless paging. The cost savings come at the expense of the user.
DynamoDB also isn't fast. 20ms for a query isn't fast, 30ms for an insert isn't fast. Yes its amazingly consistent and faster than other systems holding 500TB, but that isn't a use case for many users.