I bought the book, I read the book, I've used DynamoDB for awhile. It didn't cha...

philipkglass · on May 15, 2020

Others are comparing DynamoDB to Redis and Cassandra. It has additional limitations. These are fairly clearly spelled out but maybe weren't highlighted as prominently a few years back. (I say that because I inherited an application that made heavy use of DynamoDB but turned out not to be a great fit for DDB.)

- It provides rich types with some odd limitations: strings, sets, lists, and binaries do not allow empty values.

- You can store a maximum 400 KB data in one row.

- You can get a maximum of 1 MB data returned in a single query.

So it's mostly good for high-data-throughput applications, and then only if your high data throughput consists of large numbers of small records, processed a few at a time. This surely describes an important class of workloads. You may suffer if your workload isn't in this class.

Another annoyance is that (in my experience) one of the most common errors you will encounter is ProvisionedThroughputExceededException, when your workload changes faster than the auto-scaling. Until last year you couldn't test this scenario offline with the DynamoDB Local service because DynamoDB Local didn't implement capacity limits.

outworlder · on May 16, 2020

> - It provides rich types with some odd limitations: strings, sets, lists, and binaries do not allow empty values.

That is _infuriating_

It's documented, but it is so surprising when you first hit it. Sometimes, empty values have semantics attached to them, I don't want to scrub them out.

Aeolun · on May 16, 2020

I think that basically describes DDB in a nutshell. You think it’s like Mongo, only on AWS, but then you slowly find out it’s much, much worse (for most use cases that you’d go with Mongo).

lbruck · on May 18, 2020

Regarding the empty string and binary values: https://aws.amazon.com/about-aws/whats-new/2020/05/amazon-dy...

(Disclosure: I work for AWS on DynamoDB and on this)

dropofwill · on May 15, 2020

Note that Cassandra has similar limitations with data/throughput, but they aren't enforced or documented (because they depend on your particular setup) and your queries just fail or worse make all queries to the same node in the cluster fail (fun times with large wide rows).

The rich data types in Dynamo are quite strange, since they're basically useless for querying I'm not sure why you would use them. Maybe I'm missing something...

glkz · on May 15, 2020

The rich data types are also useful for partial updates, like adding items to a set, updating some fields in a map etc.

nuclearnice1 · on May 16, 2020

That does sound useful.

So I can have a key and an associated map and update a few members of the map?

I would like to learn how to do that.

taylor-s · on May 16, 2020

Yep! Here's the AWS documentation about this feature: https://docs.aws.amazon.com/amazondynamodb/latest/developerg....

giaour · on May 15, 2020

The rich data types can be useful for filtering, I guess? When you are running a query against a certain hash key and want the first record (sorted by range key) that meets a condition placed on a nested property of a map or whose "tags" property is a string set containing a certain member, for example.

DVassallo · on May 15, 2020

If you treat DynamoDB as a DBMS, you’re going to be disappointed (for the reasons you mention). But if you think of it as a highly-durable immediately-consistent btree in the cloud, it’s amazing. DynamoDB is closer to Redis than MySQL. Amazon does it a disservice by putting it in the databases category.

arpinum · on May 15, 2020

The indexes are not immediately consistent.

Its not just that it is put in the database category, but that its champions at AWS make statements like "if you are utilising RDBMS you are living in the past", or that "there are very few use cases to choose Postgres over DynamoDB".

Btw, loved your AWS book!.

avip · on May 15, 2020

DynamoDb is like redis without the fun data structures, the fantastic cli and discoverability, the usefull configurable tradeoff between fast and consistent, and really much-needed features s.a listing your keys.

sudhirj · on May 16, 2020

So my quarantine project is building a Redis API on top of DynamoDB - https://github.com/sudhirj/redimo.go

abd12 · on May 15, 2020

Daniel, I'm a big fan of yours but disagree with this take :).

It's definitely a database. The modeling principles are different, and you won't get some of the niceties you get with a RDBMS, but it still allows for flexible querying and more.

S3 is not a database, but DynamoDB is :).

DVassallo · on May 15, 2020

S3 and DDB are incredibly similar. Their fundamental operators are the same: key-value get/put and ordered list, and their consistency is roughly the same.

What differentiates DDB and S3 the most is cost and performance.

They're both highly-durable primitive data structures in the cloud, with a few extra features attached.

dgemm · on May 16, 2020

If you think of them as incredibly similar then you are likely not making very good use of them.

For example consistency is not "roughly the same" with DynamoDB supporting strongly consistent and atomic operations, and atomic update operations.

DVassallo · on May 16, 2020

S3 is also immediately consistent unless you’re updating an existing object or listing objects.

I was one of the top users by volume of both products when I worked at AWS.

faheel · on May 15, 2020

I agree. DynamoDB is like a serverless child of Redis and MongoDB.

balfirevic · on May 15, 2020

> If you treat DynamoDB as a DBMS, you’re going to be disappointed

Did you mean to say "as a RDBMS"? Because I don't see how it's not a DBMS.

DVassallo · on May 15, 2020

No, I meant a DBMS. Almost all the things you'd expect in a DBMS are not there. Take a look at most of the comments in this thread. Everyone expecting DBMS things, and complaining that they're not there.

abd12 · on May 15, 2020

Fair enough! I think that's a reasonable position.

IMO, there are two times you should absolutely default to DynamoDB:

- Very high scale workloads, due to its scaling characteristics

- Workloads w/ serverless compute (aka Lambda) due to how well it fits with the connection model, provisioning model, etc.

You can use DynamoDB for almost all OLTP workloads, but outside of those two categories, I won't fault you for choosing an RDBMS.

Agree that DynamoDB isn't _blazing_ fast. It's more that it's extremely consistent. You're going to get ~10 millisecond response times when you have 1GB of data or when you have 10 TB of data, and that's pretty attractive.

scarface74 · on May 15, 2020

Workloads w/ serverless compute (aka Lambda) due to how well it fits with the connection model, provisioning model, etc.

If you can use Aurora Serverless, the Data API makes sense for lambda.

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide...

abd12 · on May 15, 2020

True! I'm not a huge fan of Aurora Serverless and the Data API. The scaling for Aurora Serverless is slow enough that it's not really serverless, IMO. And the Data API adds a good bit of latency and has a non-standard request & response format, so it's hard to use with existing libraries. But it's definitely an option for those that want Lambda + RDBMS.

The RDS Proxy is _hopefully_ a better option in this regard but still early.

mmbleh · on May 15, 2020

Differing opinion - I think RDS Proxy is the wrong approach. Adding an additional fixed cost service to enable lambda seems like an indicator of a bad architecture. In this case the better approach would likely be to just use a Fargate container which would have a similar cost and fewer moving parts.

By the time you pay a fixed cost for the proxy on top of what you already pay for the RDS server, it'd be a far simpler architecture with less moving parts to just run a Fargate container (or better yet, AWS would offer a Google Cloud Run competitor)

The Data API, while still rough around the edges, at least keeps the solution more "serverless-y". Over time it should get easier to work with as tooling improves. At the very least, it won't be more difficult to work with than DynamoDB was initially with it's different paradigm.

For services that truly require consistently low latency, lambda shouldn't be used anyway, so the added latency of the data api shouldn't be a big deal IMO.

For those reasons, I view the RDS Proxy as an ugly stopgap that enables poor architecture, whereas the Data API actually enables something new, and potentially better. So I'd much rather AWS double down on it and quickly add some improvements.

scarface74 · on May 15, 2020

I agree completely. We have APIs that are both used by our website and our external customers (we sell our API for our customers to integrate with their websites and mobile apps) and for batch loads for internal use.

We deploy our APIs to Fargate for low, predictable latency for our customers and to Lambda [1] which handles scaling up like crazy and scaling down to 0 for internal use but where latency isn’t a concern.

Our pipeline deploys to both.

[1] As far as being “locked into lambda”, that’s not a concern. With API Gateway “proxy integration” you just add three or four lines of code to your Node/Express, C#/WebAPI, Python/Flask code and you can deploy your code as is to lambda. It’s just a separate entry point.

https://github.com/awslabs/aws-serverless-express

https://aws.amazon.com/blogs/developer/deploy-an-existing-as...

mmbleh · on May 16, 2020

Yup, that's exactly how I recommend clients to write lambdas for API purposes... Such a great balance of getting per request pricing while retaining all existing tooling for building APIs

scarface74 · on May 16, 2020

For an internal API with one or two endpoints, I‘ll do things the native Lambda way. Your standard frameworks are heavy when all you need to do is respond to one or two events and you can do your own routing, use APIGW and API Key for authorization, etc.

There is also a threshold between “A group of developers will be developing this API and type safety would be nice so let’s use C#” and “I can write and debug this entire 20-50 line thing in the web console in Python, configure it using the GUI, and export a SAM CloudFormation template for our deployment pipeline.”

Aeolun · on May 16, 2020

> By the time you pay a fixed cost for the proxy on top of what you already pay for the RDS server, it'd be a far simpler architecture with less moving parts to just run a Fargate container

A lot of people want to use lambda (or serverless) even so. So AWS is just accommodating their wishes.

scarface74 · on May 16, 2020

We can’t use Aurora Serverless even in our non Prod environments because we have workflows that involving importing and exporting data to and from S3. But really, our Aurora servers in those environments are so small that most of our costs are storage.

sudhirj · on May 16, 2020

Not to mention the same also applies for load. You get about 10ms at 10, 1000 or 1000000 requests per second, again irrespective of how much data you have.

bufferoverflow · on May 15, 2020

There's a third use: if you want a free ride, AWS free tier for DynamoDB is quite nice, enough to run a decent dynamic website.

scarface74 · on May 15, 2020

Especially combined with the always free tier of lambda....

arpinum · on May 15, 2020

> Workloads w/ serverless compute (aka Lambda) due to how well it fits with the connection model, provisioning model, etc.

This is only true for AWS. Azure functions share resources and don't have this issue.

The speed is actually quite sad. Its 5-10x slower than my other databases at p95, and I can't throw money at the problem on the write side. Reads I can use DAX, but then there goes consistency.

abd12 · on May 15, 2020

Good point! I would usually not recommend using a database from a different cloud provider just because of different hassles around permissions, connections, etc.

I've never found the speed an issue, but YMMV. To me, the best thing is that you won't see speed degradation as you scale. With a relational database, your joins will get slower and slower as the size of your database grows. With DynamoDB, it's basically the same at 1GB as it is at 10TB.

dropofwill · on May 15, 2020

RDS maxes out RAM at 768GiB, if we're comparing managed to managed.

If you're approaching that point, you already are going to need an analytics pipeline, a search DB, etc, because maintaining ever growing indices will kill your latency. You probably can get away with aggregations for a bit longer, but if the number of rows you aggregate is growing too, eventually you will need to come up with something and the way you do that with Dynamo off a stream isn't a bad way to go about it with MySql either.

Looking at the tables I have access to, they all come under 5ms for both read/write. This is the same ballpark as our MySql apps for similar style queries (i.e. not aggegrations).

Sadly my favorite reason to use Dynamo is political, not technical. Since it somehow is not classified as a database at my company, the DBAs don't 'own' it. So I don't have to wait 2-3 months for them to manually configure something.

Conway's law strikes again.

arpinum · on May 15, 2020

> RDS maxes out RAM at 768GiB

RDS goes 4TB on X1e instance type. But the point is RDBMS systems handle a large amount of data and workload types before needing to reach for specialist systems

I don't know how you are doing write transactions in 5ms on DynamoDB. Single puts p50 maybe, but i've never seen p90 put operations below 10ms.

dropofwill · on May 15, 2020

Yeah, p50, all use cases we have are single row updates.

abd12 · on May 15, 2020

Haha, I love that story at the end. I promise not to tell your company that it is a database.

Ambol · on May 16, 2020

>but scale isn't a problem many people need solving when 2TB of RAM fits in a single box.

What's the price of that on the cloud? I know I can run crazy big tables on DynamoDB for a couple of dollars. I don't know what 1 month of a relational database with 2TB of RAM costs on the cloud, but I am pretty sure I can't afford it.

StreamBright · on May 15, 2020

We used to run Riak for Dynamo like workloads very efficiently, 30ms p50 insertion time.

ryanmarsh · on May 16, 2020

Very disappointed to find the top comment is about DynamoDB and not Alex and his wonderful book. I suppose this is par for the course with HN. I hope nothing I create ever ends up posted here.

arpinum · on May 16, 2020

You are disappointed the comment is about the subject of the book from someone who read it and didn’t find it a compelling read? I didn’t like the book because it continued a trend in mia-selling DynamoDB and my comment reflects that frustration. Sorry but not every review is going to be glowing, and I’m certainly not going to make personal comments about the author.

qaq · on May 15, 2020

6TB fits in a single box