We took a big bet on Cassandra, and then on an opinionated wrapper around Cassan...

PeterCorless · on Aug 24, 2021

We just benchmarked Cassandra 4.0, which is brand-spanking new.

The good news: C4.0 is a far better performing database than C3.11. The new GCs definitely get rid of the long tail latency nightmares:

https://www.scylladb.com/2021/08/19/cassandra-4-0-vs-cassand...

However, we also compared it to Scylla's latest release, and though C4 is better*, you can still find other CQL-compatible databases that outperform it. Especially around compactions and topology changes:

https://www.scylladb.com/2021/08/24/apache-cassandra-4-0-vs-...

Just published these numbers today.

imadeamistake · on Aug 24, 2021

I like Scylla. I've been working with it for a while now, and it's a good alternative for transactional loads. It's a hell of a lot faster than Cassandra, and much much much much cheaper than DynamoDB. Cassandra has always felt like improvements came in fits and starts. I work at a Fortune 50 company, and Amazon quoted us ~$2-3 million a year to run our load on Dynamo (we were paying $350k/year for Aurora). With Scylla, we're looking at > $100k to get an order of magnitude better performance and a nice stable system that doesn't wake me up in the middle of the fucking night.

I was following Scylla since the beginning (only because I like their mascot), and it's actually sort of interesting to see what's going on with the company. I've spent the past few years designing things where transactional systems backed by Cassandra. This is the first time I've been able to use Scylla on someone else's dime, though. The unpleasantly big company I'm at right now is looking to replace a bunch of infrastructure with ScyllaDB (Couchbase, Cassandra, Elasticsearch, DynamoDB). It's catching on for sure, but it still doesn't return any results when I search Dice. It looks like Discord is hiring, though...

cityzen · on Aug 25, 2021

I'd love to hear how Dynamo would end up being $2-3 million a year They sure do a great job of convincing people that it's cheap so I'm curious where the cost seems to blow up?

PeterCorless · on Aug 25, 2021

If you are doing north of a million ops on DynamoDB you can quickly run into the $2-3 million a year range.

In this 2018 benchmark, we were able to calculate that a sustained, provisioned of only 160k write ops / 80k read ops for DynamoDB would cost >$500k per year:

https://www.scylladb.com/2018/12/13/scylla-vs-amazon-dynamod...

That was a few years ago. These days, according to our most current pricing you could do DynamoDB provisioned, 1 year reserved for $38,658/month, which is "only" $463,896 annually (pop up the "Details" button and choose "vs. DynamoDB"):

https://www.scylladb.com/pricing/?writes=160000&reads=80000&...

The same workload on Scylla Cloud would be only $7,442/month, or $89,304 annually.

If you wanted, say, 1m ops — 500k write / 500k read ops — on DynamoDB, that'll run you $131,078/month, or $1,572,936 per year.

https://www.scylladb.com/pricing/?writes=500000&reads=500000...

The same workload on Scylla Cloud would run $29,768 reserved/month, or $357,216 per annum — 77% cheaper.

Of course, all of this is just pure list price. Depending on volume you might be able to negotiate better pricing. However, you'd need a really steep discount for DynamoDB just to get back to Scylla Cloud's list price.

Let me know if you spot any math errors or omissions on my part.

cityzen · on Aug 25, 2021

Thanks for the response, Peter. I may shoot you an email with some additional questions.

throwdbaaway · on Aug 25, 2021

> Maybe the situation has changed since 2016. In my experience with several employers since then, it seems like every enterprise architect fell in love with Cassandra around 2014-2015 and then had a long, painful, protracted breakup.

I think 2012-2014 was peak marketing from DataStax. There would be some new major feature with every new blog post, and it would mostly never work as expected. Between 2017 and now, things have settled down.

trashcan · on Aug 24, 2021

I've used Cassandra at two companies, and had the exact same experience as you at the first company. At a much bigger company that had some very, very highly paid Cassandra DBAs it was actually a relatively smooth experience.

objektif · on Aug 24, 2021

How much is very very high pay?

trashcan · on Aug 24, 2021

I don't think it would be appropriate for me to say very specifically, but I suspect about double what a software engineer with the same amount of experience would earn.

imadeamistake · on Aug 24, 2021

Data is big $$$. Slap a couple of NoSQL databases and Spark on your resume and watch the money roll in. DBAs are disappearing with managed services, though.

legerdemain · on Aug 25, 2021

Yeah, I don't know about "slap." We want you to have deep production experience with these systems. Designing them, deploying them at significant scale, predicting their pitfalls and avoiding them proactively. Diagnosing systemic problems and finding reasonable solutions.

If you can't magically put out production fires, on huge high-throughput systems, potentially in the dead of night, we are unlikely to pay you $300-400K.

imadeamistake · on Aug 25, 2021

The intent was to be a little hyperbolic and self-effacing. In terms of competent and capable developers, I think it’s hard to get a better return on your skill set than adding “data” stuff. And honestly I think it’s one of the most critical skill sets that is lacking across the board. So many bit companies have great data engineering teams, but generally other dev teams are left to design their own databases, which is a shit show. And even then, it’s amazing to me how difficult it is for data engineering teams to move from framework to framework without just mapping old solutions onto new technology.

My career has been primarily focused on something like “bringing modern data-driven solutions” to big companies. The one thing that is a constant challenge is that most teams (and leadership) aren’t prepared to handle responsibilities of data engineering and stewardship in transactional, operational systems. I feel like critical responsibility when I come on as a consultant is to impart knowledge about managing their data.

PeterCorless · on Aug 24, 2021

If you have been branding yourself as a DBA, time to lift and shift to "DevOps."

trashcan · on Aug 25, 2021

The folks I am referring to were actually hired as DevOps DBAs.

Distributed databases are pretty difficult to manage, they deserve every dollar.

ZephyrBlu · on Aug 25, 2021

Software Engineer salaries vary wildly, so this is not particularly helpful. You could easily be talking about anywhere between $200k-$700k.

trashcan · on Sept 2, 2021

To elaborate, I meant 2x compared to SWEs at the same company. I'd prefer not to post exact dollar amounts as they are relative based on location, company, and several other things.

yukinon · on Aug 24, 2021

If you had to do it again, what would you choose instead?

legerdemain · on Aug 24, 2021

These days? Redshift or B̶i̶g̶Q̶u̶e̶r̶y̶ BigTable (thanks for correcting my think-o!). Back then? Maybe HBase.

chillacy · on Aug 24, 2021

Do you mean BigTable? That's Google's "HBase" in sofar as hbase is based on the BT paper.

From what I recall from using it a few years ago, it's pretty damn fast, very low latency. HBase had speedy p50s as well but tended to get quite slow at p99 due to GC.