We took a big bet on Cassandra, and then on an opinionated wrapper around Cassandra at $PASTJOB. The use case was a text search engine for syslog-type stuff.
The product we built using Cassandra was widely known as our buggiest and least maintainable, and it died a merciful death after several years of being inflicted on customers.
We didn't have a good handle on the exact perf implications of different values of read/write replication. Writing product code to handle a range of eventual consistency scenarios is challenging. The memory consumption and duration of compactions and column/node repair jobs is hard to model and accommodate. It's hard to tell what the cluster is doing at any given moment. Our experience with support plans from Datastax was also pretty dismal.
Maybe the situation has changed since 2016. In my experience with several employers since then, it seems like every enterprise architect fell in love with Cassandra around 2014-2015 and then had a long, painful, protracted breakup.
However, we also compared it to Scylla's latest release, and though C4 is better*, you can still find other CQL-compatible databases that outperform it. Especially around compactions and topology changes:
I like Scylla. I've been working with it for a while now, and it's a good alternative for transactional loads. It's a hell of a lot faster than Cassandra, and much much much much cheaper than DynamoDB. Cassandra has always felt like improvements came in fits and starts. I work at a Fortune 50 company, and Amazon quoted us ~$2-3 million a year to run our load on Dynamo (we were paying $350k/year for Aurora). With Scylla, we're looking at > $100k to get an order of magnitude better performance and a nice stable system that doesn't wake me up in the middle of the fucking night.
I was following Scylla since the beginning (only because I like their mascot), and it's actually sort of interesting to see what's going on with the company. I've spent the past few years designing things where transactional systems backed by Cassandra. This is the first time I've been able to use Scylla on someone else's dime, though. The unpleasantly big company I'm at right now is looking to replace a bunch of infrastructure with ScyllaDB (Couchbase, Cassandra, Elasticsearch, DynamoDB). It's catching on for sure, but it still doesn't return any results when I search Dice. It looks like Discord is hiring, though...
I'd love to hear how Dynamo would end up being $2-3 million a year They sure do a great job of convincing people that it's cheap so I'm curious where the cost seems to blow up?
If you are doing north of a million ops on DynamoDB you can quickly run into the $2-3 million a year range.
In this 2018 benchmark, we were able to calculate that a sustained, provisioned of only 160k write ops / 80k read ops for DynamoDB would cost >$500k per year:
That was a few years ago. These days, according to our most current pricing you could do DynamoDB provisioned, 1 year reserved for $38,658/month, which is "only" $463,896 annually (pop up the "Details" button and choose "vs. DynamoDB"):
The same workload on Scylla Cloud would run $29,768 reserved/month, or $357,216 per annum — 77% cheaper.
Of course, all of this is just pure list price. Depending on volume you might be able to negotiate better pricing. However, you'd need a really steep discount for DynamoDB just to get back to Scylla Cloud's list price.
Let me know if you spot any math errors or omissions on my part.
> Maybe the situation has changed since 2016. In my experience with several employers since then, it seems like every enterprise architect fell in love with Cassandra around 2014-2015 and then had a long, painful, protracted breakup.
I think 2012-2014 was peak marketing from DataStax. There would be some new major feature with every new blog post, and it would mostly never work as expected. Between 2017 and now, things have settled down.
I've used Cassandra at two companies, and had the exact same experience as you at the first company. At a much bigger company that had some very, very highly paid Cassandra DBAs it was actually a relatively smooth experience.
I don't think it would be appropriate for me to say very specifically, but I suspect about double what a software engineer with the same amount of experience would earn.
Data is big $$$. Slap a couple of NoSQL databases and Spark on your resume and watch the money roll in. DBAs are disappearing with managed services, though.
Yeah, I don't know about "slap." We want you to have deep production experience with these systems. Designing them, deploying them at significant scale, predicting their pitfalls and avoiding them proactively. Diagnosing systemic problems and finding reasonable solutions.
If you can't magically put out production fires, on huge high-throughput systems, potentially in the dead of night, we are unlikely to pay you $300-400K.
The intent was to be a little hyperbolic and self-effacing. In terms of competent and capable developers, I think it’s hard to get a better return on your skill set than adding “data” stuff. And honestly I think it’s one of the most critical skill sets that is lacking across the board. So many bit companies have great data engineering teams, but generally other dev teams are left to design their own databases, which is a shit show. And even then, it’s amazing to me how difficult it is for data engineering teams to move from framework to framework without just mapping old solutions onto new technology.
My career has been primarily focused on something like “bringing modern data-driven solutions” to big companies. The one thing that is a constant challenge is that most teams (and leadership) aren’t prepared to handle responsibilities of data engineering and stewardship in transactional, operational systems. I feel like critical responsibility when I come on as a consultant is to impart knowledge about managing their data.
To elaborate, I meant 2x compared to SWEs at the same company. I'd prefer not to post exact dollar amounts as they are relative based on location, company, and several other things.
Do you mean BigTable? That's Google's "HBase" in sofar as hbase is based on the BT paper.
From what I recall from using it a few years ago, it's pretty damn fast, very low latency. HBase had speedy p50s as well but tended to get quite slow at p99 due to GC.
The product we built using Cassandra was widely known as our buggiest and least maintainable, and it died a merciful death after several years of being inflicted on customers.
We didn't have a good handle on the exact perf implications of different values of read/write replication. Writing product code to handle a range of eventual consistency scenarios is challenging. The memory consumption and duration of compactions and column/node repair jobs is hard to model and accommodate. It's hard to tell what the cluster is doing at any given moment. Our experience with support plans from Datastax was also pretty dismal.
Maybe the situation has changed since 2016. In my experience with several employers since then, it seems like every enterprise architect fell in love with Cassandra around 2014-2015 and then had a long, painful, protracted breakup.