More

abd12 · on May 15, 2020

Waves Author here. Happy to answer any questions folks have about the book, about DynamoDB, or about self-publishing.

NoSQL modeling is waaay different than relational modeling. I think a lot of NoSQL advice out there is pretty bad, which results in people dismissing the technology altogether. I've been working with DynamoDB for a few years now, and there's no way I'll go back.

The book has been available for about a month now, and I've been pretty happy with the reception. Strong support from Rick Houlihan (AWS DynamoDB wizard) and a lot of other folks at AWS.

You can get a free preview by signing up at the landing page. If you buy and don't like it, there's a full money-back guarantee with no questions asked. Also, if you're having income problems due to COVID, hit me up and we'll make something work :)

Anyhow, hit me up with questions!

EDIT: Added a coupon code for folks hearing about the book here. Use the code "HACKERNEWS" to save $20 on Basic, $30 on Plus, or $50 on Premium. :)

slashdev · on May 15, 2020

The biggest problem I'm aware of with DynamoDB is the hot key / partition issue[1]. Throughout is distributed evenly across nodes, you can't control how many nodes you have, so you always have a node that's hot either temporarily or permanently and so you end up having to over provision all your nodes to be able to handle that hot case, which ends up costing far more than alternatives. What's your take on this? This is the chief reason I avoid DynamoDB, which in theory would be a good fit for some of my problems.

[1] https://syslog.ravelin.com/you-probably-shouldnt-use-dynamod...

luhn · on May 15, 2020

As of a couple years ago, DynamoDB will redistribute throughput between shards based on usage [1], so in theory this should eliminate the hot shard problem. I haven't had a chance to test this in practice, if anybody has hands-on experience I'd love to hear it.

You also finally have a way of identifying hot keys with the terribly named CloudWatch Contributor Insights for DynamoDB. [2]

For exceptional use cases, you also have the option of On-Demand Capacity to pay for what you use and not worry about capacity at all. [3]

[1] https://docs.aws.amazon.com/amazondynamodb/latest/developerg...

[2] https://docs.aws.amazon.com/amazondynamodb/latest/developerg...

[3] https://docs.aws.amazon.com/amazondynamodb/latest/developerg...

slashdev · on May 15, 2020

That sounds like the problem had been solved and my information is just out of date now. Maybe I should give DynamoDB another look now.

Judson · on May 15, 2020

With instant adaptive capacity, I think quite a few hot key issues are mitigated.

https://aws.amazon.com/blogs/database/how-amazon-dynamodb-ad...

abd12 · on May 15, 2020

luhn responded to this one pretty well :)

Basically, most of these issues are gone. As long as you don't have extreme skew in your partition keys, you don't need to worry about throughput limits.

bwarren2 · on May 15, 2020

Bought the book, thank you!

What was your approach to self-publishing here? What tools did you use? If I wanted to publish a book but knew nothing about it, what resources should I read and what approach would you recommend?

abd12 · on May 15, 2020

Thank you for your support!

The biggest advice I can give you is not about any specific tool, it's about an approach. You need to think about how you will market the book if you're self-publishing.

Engage with the community that will be interested in the book. Write articles, help out on Twitter, write code libraries, etc.

For me, I wrote DynamoDBGuide.com two and a half years ago over Christmas break. I wanted to just make an easier introduction to DynamoDB after I watched Rick Houlihan's talk at re:Invent (which is awesome).

That led to other opportunities and to me being seen as an 'expert' (even when I wasn't!). I got more questions and spent more time on DynamoDB to the point where I started to know more. I gave a few talks, etc.

I finally decided to do a book and set up a landing page and mailing list. I basically followed the playbook that Adam Wathan described for his first book launch.[0] Write in public, release sample chapters, engage with people, etc.

In terms of tooling, I used AsciiDoc to generate the book and Gumroad to sell. On a 1-10 scale, I'd give AsciiDoc a 5 and Gumroad an 8. But the tooling barely matters -- think about how to find the people that are interested :)

Happy to answer any other questions, either in public or via email.

[0] - https://adamwathan.me/the-book-launch-that-let-me-quit-my-jo...

bilalq · on May 15, 2020

Just bought the book. I've been working at AWS and using DynamoDB for years now, but I'm sure there are things I could be doing better. I love that you've dedicated attention to analytics and operations too.

abd12 · on May 15, 2020

Thank you! I really appreciate it :) Hit me up if you have any questions!

balfirevic · on May 15, 2020

Honest question: would you say "NoSQL modeling is way more restrictive, labor intensive and painful, but in turn gives you consistent performance as you scale" is a fair characterization?

AlisdairO · on May 17, 2020

I'd been sort-of considering buying this for a while and the coupon made me pull the trigger. Thanks!

dkobia · on May 16, 2020

Any chance for a Kindle friendly mobi?

abd12 · on May 16, 2020

Yep! It comes with PDF, MOBI, and EPUB formats :)

Scarbutt · on May 16, 2020

and there's no way I'll go back.

err.. back to what?

abd12 · on April 8, 2020

One note on this -- if you have an LSI, you can't have an item collection larger than 10GB, where an item collection refers to all the items with the same partition key in your main table and your LSI.

A DynamoDB table with an LSI can scale far beyond 10GB. That said, I would avoid LSIs in almost all circumstances. Just go with a GSI.

debrice · on April 8, 2020

Thank you for the clarification abd12, you are right. I only create LSI if I know for sure the data within a given shard will not go beyond 10GB... since we never really do we always go with GSI.

abd12 · on April 8, 2020

I just answered this on Twitter, but I think there are two instances where it's a no-brainer to use DynamoDB:

- High-scale situations where you're worried about performance of a relational database, particularly joins, as it scales.

- If you're using serverless compute (e.g. AWS Lambda or AppSync) where traditional databases don't fit well with the connection model.

That said, you can use DynamoDB for almost every OLTP application. It's just more a matter of personal preference as to whether you want to use a relational database or something like DynamoDB. I pick DynamoDB every time b/c I understand how to use it and like the other benefits (billing model, permissions model, performance characteristics), but I won't say you're wrong if you don't choose it in these other situations.

dfee · on April 8, 2020

Does the connection model problem go away when using serverless rds? https://aws.amazon.com/rds/aurora/serverless/

agacera · on April 8, 2020

I don't know about aurora serverless.

But aws offers a proxy exactly for this purpose.

https://aws.amazon.com/rds/proxy/

dfee · on April 8, 2020

Very interesting. Thank you for sharing.

abd12 · on April 8, 2020

It's different than a relational database in that you need to model your data to your patterns, rather than model your data and then handle your patterns.

Once you learn the principles, it really is like clockwork. It changes your process, but you implement the same process every time.

Honestly, I think part of the problem is that there's a lot of bad NoSQL content out there. A little standardization of process in this space will go a long way, IMO :)

abd12 · on April 8, 2020

Thank you! Glad you found DynamoDBGuide.com helpful :)

Yep, I do talk about aggregations in the book. One strategy that I've discussed is available in a blog post here[0] and involves using DynamoDB Transactions to handle aggregates.

If you're looking for large-scale aggregates for analytics (e.g. "What are my top-selling items last month?"), I have an Analytics supplement in the Plus package that includes notes on different patterns for analytics w/ DynamoDB.

Let me know if that helps!

[0] - https://www.alexdebrie.com/posts/dynamodb-transactions/#hand...

abd12 · on April 8, 2020

Great point! It was originally about both of these things, but the storage aspect isn't discussed much anymore because it's really not a concern.

The data integrity issue is still a concern, and I talk about that in the book. You need to manage data integrity in your application and think about how to handle updates properly. But it's completely doable and many people have.

AmericanChopper · on April 8, 2020

> You need to manage data integrity in your application

This is just another way of saying you need to implement your own system for managing consistency. Dynamo offers transactions now, but they don’t offer actual serialization. Your transaction will simply fail if it runs into any contention. You might think that’s ok, just retry. But because you’ve chosen to sacrifice modelling your data, this will happen a lot. If you want to use aggregates in your Dynamo, you have to update an aggregate field every time you update your data, which means you create single points of contention for your failure prone transactions to run into all over your app. There are ways around this, but they all come at the cost of further sacrificing consistency guarantees. The main issue with that being that once you’ve achieved an inconsistent state, it’s incredibly difficult to even know it’s happened, let alone to fix it.

Then you run into the issue of actually implementing your data access. With a denormalized data set, implementing all your data access comes at the expense of increasing interface complexity. Your schema ends up having objects, which contain arrays of objects, which contain arrays... and you have to design all of your interfaces around keys that belong to the top level parent object.

The relational model wasn’t designed to optimize one type of performance over another. It was designed to optimize operating on a relational dataset, regardless of the implementation details of the underlying management system. Trying to cram a relational set into a NoSQL DB unavoidably comes at the expense of some very serious compromises, with the primary benefit being that the DB itself is easier to administer. It’s not as simple as cost of storage vs cost of compute. NoSQL DBs like dynamo (actually especially dynamo) are great technology, with many perfectly valid use cases. But RDBMS is not one of them, and everybody I’ve seen attempt to use it as an RDBMS eventually regrets it.

owenmarshall · on April 8, 2020

Are there any basic examples you can give around maintaining that integrity?

I'm liking DynamoDB for tasks that fit nicely within a single domain, have relatively pain-free access patterns, etc. And I've found good fits, but there are some places where the eventual consistency model makes me nervous.

I'm specifically thinking about updating multiple different DynamoDB keys that might need to be aggregated for a data object. The valid answer may be "don't do that!" – if so, what should I do?

(I'll probably just buy the book ;-))

AmericanChopper · on April 9, 2020

> Are there any basic examples you can give around maintaining that integrity?

For those types of use cases, the OP’s advice would actually require implementing a fully bespoke concurrency control system in your business logic layer. Without trying to disparage the OP, this is for all intents and purposes, impossible (aside from also being very, very impractical). There’s some things you can do to create additional almost-functional (though still highly impractical) consistency controls for dynamo (like throttling through FIFO queues), but they all end up being worse performance and scaling trade-offs then you’d get from simply using an RDBMS.

A lot of it boils down to the fact that dynamo doesn’t have (and wasn’t designed to have) locking, meaning that pretty much any concurrency control system you want to implement on top of it, is eventually going to run into a brick wall. The best you’d possibly be able to do is a very, very slow and clunky reimplementation of some of Spanner’s design patterns.

danenania · on April 9, 2020

Yeah, the limited transaction support is a killer for many use cases.

Thankfully, it’s not actually all that hard to implement your own “dynamo layer” on top of an SQL database and get most of the scaling benefits without giving up real transactions.

abd12 · on April 8, 2020

I think there are some features of DynamoDB that are miles ahead of other databases:

- Billing model. Pay for reads and writes directly rather than trying to guess how your queries turn into CPU & RAM. Also able to scale reads and writes up and down independently, or use pay-per-use pricing to avoid capacity planning.

- Permissions model: Integrates tightly with AWS IAM so works well with AWS compute (EC2, ECS/EKS, Lambda) with IAM roles. Don't need to think about credential management and rotation.

- Queries will perform the same as you scale. It's going to work the exact same in testing and staging as it is in prod. You don't need to rewrite when you get four times as many users.

A lot of folks are worried about migrations, but they're not as bad as you think. I've got a whole chapter on how to handle migrations. Plus, one of the examples imagines that we're re-visiting a previous example a year later and want to add new objects and change some access patterns. I show how it all works, and they're really not that scary.

abd12 · on April 8, 2020

Author here! If you want more, I just released a book on DynamoDB yesterday --> https://www.dynamodbbook.com/ . There's a launch discount for the next few days.

The book is highly recommended by folks at AWS, including Rick Houlihan, the leader of the NoSQL Blackbelt Team at AWS[0].

Happy to answer any questions you have! Also available on Twitter and via email (I'm easily findable).

[0] - https://twitter.com/houlihan_rick/status/1247522640278859777

mabbo · on April 8, 2020

After seeing Rick's Re:invent talk, the one where at about minute 40 everyones' heads exploded, I emailed him (I'm in a very far away other department of Amazon) to ask him for more, because everything he was saying was absolutely not the way my group was using DynamoDB (ie: we were doing it wrong).

He could have ignored my email entirely. He's a busy guy, right? I wouldn't have held it against him at all. Instead, he was super nice, provided me with more documentation, and honestly was just really helpful.

If he recommends your book, I'm buying it.

cityzen · on April 8, 2020

Shout out to Rick Houlihan! I even made my wife watch his 2018 talk!

swyx · on April 8, 2020

aww thats such a nice story to hear :) i would absolutely hope he treats internal customers the same as external.

abd12 · on April 8, 2020

Thanks! Yea, Rick is awesome and really helpful. He wrote the foreword to the book as well :)

HatchedLake721 · on April 8, 2020

Link please?

abd12 · on April 8, 2020

Rick has had the most-watched re:Invent talk the last three years.

Here are the links for each:

- 2019: https://www.youtube.com/watch?v=6yqfmXiZTlM

- 2018: https://www.youtube.com/watch?v=HaEPXoXVf2k

- 2017: https://www.youtube.com/watch?v=jzeKPKpucS0

Additional DynamoDB links here: https://github.com/alexdebrie/awesome-dynamodb

maerF0x0 · on April 8, 2020

I wonder if you might talk a bit about what more is in the book than already publicly available in the awesome-dynamodb roundup?

If someone had already successfully using dynamodb in production for a year or two what would be the main value of your course?

abd12 · on April 8, 2020

If you want a look at the Table of Contents, check it out here: https://s3.amazonaws.com/dynamodb-book/table-of-contents.pdf

For a third-party breakdown of the book structure and content, check out Shawn Wang's review here: https://www.swyx.io/writing/dynamodb-book/

Shawn lists a four-part breakdown that's pretty on-point:

- Background and basics (Chapters 1-6)

- General advice for modeling & implementation (Chapters 7-9)

- DynamoDB strategies, such as how to handle one-to-many relationships, many-to-many relationships, complex filtering, migrations, etc. (Chapters 10-16).

- Five full walkthrough examples, including some pretty complex ones. One of them implements most of the GitHub metadata backend. Nothing related to the git contents specifically but everything around Repos, Issues, PRs, Stars, Forks, Users, Orgs, etc.

The first nine chapters you can probably find available if you google around enough. It's helpful to have all in one place. But the last 13 chapters are unlike anything else available, if I do say so myself. Super in-depth stuff.

I think it will be helpful even if you've been using it for a while, but it really depends on how deep you went on DynamoDB.

tiew9Vii · on April 9, 2020

Interested in the book.

It's a bit expensive especially with exchange rates but it is what it is, these things take your time and effort to produce.

From the website I cannot see a table of contents for the book unless this is under provide an email for free chapters. Can you please provide a publicly visible table of contents no email required on the site/here?.

This will be the deciding factor for me making a purchase as I'll be able to see what the book covers and if it'll be of use to me having several years Dynamo experience.

deckarep · on April 8, 2020

How much of the book is directly applicable to Cassandra? I understand that Apache Cassandra is regarded as the “open-source” version of DynamoDB?

abd12 · on April 8, 2020

The concepts are definitely applicable. You'll need to do some small work to translate vocabulary and work around a slightly different feature set, but most of it should work for you.

And yep, Cassandra is pretty similar to DynamoDB. Both are wide-column data stores. Some of the original folks that worked on Dynamo (not DynamoDB) at Amazon.com went to Facebook and worked on Cassandra. The concepts underlying Dynamo because the basis for DynamoDB at AWS.

For more background on Dynamo, check out The Dynamo Paper: https://www.allthingsdistributed.com/files/amazon-dynamo-sos...

erik_seaberg · on April 8, 2020

If you take Cassandra and remove the power user features whose runtime cost is hard to predict, what you're left with is pretty close to DynamoDB. It's harder to use and incompatible with everything else, but its key feature is not overpromising capacity. We're only considering alternatives because there's no cost saving story around tiered storage.

time0ut · on April 8, 2020

Rick is awesome. Bought the book without hesitation. Can't wait to dig through it.

I've used DynamoDB in production (small scales only sadly) on a number of projects. Definitely done it the wrong way a few times!

abd12 · on April 8, 2020

We've all been there! :)

Thanks for your support. I really appreciate it. Don't hesitate to hit me up with any questions or feedback.

mulmen · on April 8, 2020

Very exciting! Is there any plan to release a physical copy? With reference books I much prefer something I can put on my desk.

abd12 · on April 8, 2020

Thanks! Honestly, I'm right with you. I would love a physical copy. I did a bit of research and didn't find any great options for making a physical copy of a self-published book for the number of copies I'm expecting to sell (given it's a fairly niche technical area).

That said, if anyone has any great recommendations here, I'm all ears. Actual experience would be best if possible, rather than the first thing you see in Google :).

swyx · on April 8, 2020

uh doesnt amazon itself have a physical book publishing service?

k__ · on April 8, 2020

Is a HTML version available?

abd12 · on Feb 12, 2020

Writing.

It helps to clarify your thinking. At your job, it will help you explain concepts to others and multiply your efforts.

Externally, it can help make great friends and boost your career if you blog & share things you've learned.

112 · on Feb 13, 2020

This. Being able to think is the most underrated skill.

There is no thinking without writing and drawing.

I HIGHLY recommend reading Julian Shapiro's guide on writing: https://www.julian.com/guide/write/intro

The market is shaping each generation. Unfortunately, it's pretty common to interact with engineers that don't think before doing, they just start doing with the hope of "figuring it out as we go". This leaves no room for beautiful, simple code.

Is it a side-effect of the web boom? How much of this was the result of the PHP and JavaScript subcultures? Couldn't say.

undergrowth54 · on Feb 15, 2020

Its a side-effect of hearing stories of the other extreme of the dichotomy.

undergrowth54 · on Feb 13, 2020

If you read the parent comment with feeling of frustration and dread, ask yourself: "Am I bad at writing a first draft? Or am I bad at editing my writing?". These are two different things.

-----

For learning to edit a piece of writing you've already written, I recommend this book: https://www.amazon.co.uk/Style-Clarity-Chicago-Writing-Publi... It is full of advice more actionable than Strunk and White.

-----

For learning how to write a first draft without wanting to dig your nails into your arms or more drastic forms of self-harm, I recommend:

1) "Start with Why" -- Start each draft by writing your goal and then writing the "signs of success" which you can use to recognize making progress.

I find a particularly motivating form of "why" is a description of a problem someone finds themselves in which they want advice about. Reddit is a good source of these if you want inspiration, but you might be better off giving advice to your future self.

2) Take inspiration from automated testing -- If you are anxious about some section being [maliciously] misinterpreted, write down that anxiety with a pointer to the section and a promise to yourself to have a trusted friend read it.

3) Question-Driven-Drafting -- Start with a question. Write the first flawed answer that comes to mind. Write the first question or objection that comes to mind from that answer. Write the first response that comes to mind from that etc... Don't delete.

4) Alternate focusing with exploring -- Set a pomodoro timer. Use my method from #3 or another to produce a bunch of text. When it goes off, congratulate yourself and meditate for 2 minutes. Then set another pomodoro timer and start to turn your ideas into a structured outline: Try to extract the one-sentence key point from the text you just wrote. Then try to build a pyramid-shaped hierarchy under it with the names of the supporting points. When this pomodoro ends, meditate again and start another rambling pomodoro based on the most interesting point.

5) Be willing to "overthink things" -- When you have a question, be willing to actually trust yourself that the question is worth answering. If someone else thinks the answer is obvious, just move on to ask someone else. If someone screams at you that your question is excuse-making, bullshit, or procrastinating, just move on from them. You no longer have parents nor teachers to endure and can write from your own desire to understand the world and communicate ideas. The confusion you notice in yourself is worthy not of ridicule but of sympathetic curiosity.

6) Go for a walk -- Its just a generally good idea.

7) Dictate into otter.ai -- It is as good a use for your time while walking as any. The resulting text will be heavy with misspellings, but you'll be able to edit it and it will restore your sense of confidence in your ability to generate ideas.

8) Work with a writing coach or therapist if you can find a good one. They might be expensive, but they're less expensive than getting fired because you handed in a blank performance self-evaluation.

abd12 · on Feb 23, 2019

Original author here.

The two sentences right before what you quoted are helpful:

"I’m not a Docker or Flask performance expert, and that’s not the goal of this exercise. To remedy this, I decided to bump the specs on my deployments.

The general goal for this bakeoff is to get a best-case outcome for each of these architectures, rather than an apples-to-apples comparison of cost vs performance."

I wasn't trying to squeeze out every ounce of performance and determine the minimum number of instances to handle 100 req/sec. I was trying to normalize across the three patterns as much as possible to see best-case performance. I didn't want resource constraints to be an excuse.

CaveTech · on Feb 24, 2019

I think there’s a pretty big flaw in your benchmark because that failure rate is _insane_ and you shouldn’t need anywhere near that hardware to accomplish this. I don’t think your data is credible as a result.