Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've used cassandra quite a bit and even I had to go back and figure out what this primary key means:

    ((channel_id, bucket), message_id)
The primary key consists of partition key + clustering columns, so this says that channel_id & bucket are the partition key, and message_id is the one and only clustering column (you can have more).

They also cite the most common cassandra mistake, which is not understanding that your partition key has to limit partition size to less than 300MB, and no surprise: They had to craft the "bucket" column as a function of message date-time because that's usually the only way to prevent a partition from eventually growing too large. Anyhow, this is incredibly important if you don't want to suffer a catastrophic failure months/years after you thought everything was good to go.

They didn't mention this part: Oh, I have to include all partition key columns in every query's "where" clause, so... I have to run as many queries as are needed for the time period of data I want to see, and stitch the results together... ugh... Yeah it's a little messy.



Reading the article I was right now visiting Cassandra site to figure out what the catch is. Surely there should be a catch.

Well, here it is. The partitioning in manual upto the SQL level.


The bigger catch is that when your partition grows too big and your nodes are hit by the OOMKiller, you have very few options other than create a new table and replay data, or use a cli tool to manually partition your data while the node is offline.

Using Cassandra tends to mean pushing costs to your developers instead of spending more money on storage resources, and your devs will almost certainly spend a ton of time fixing downed nodes.

Apple supplied some of the biggest contributors to Cassandra who were optimizing things like how to read data in a partition without fully reading the partition into memory to avoid the terrible GC cost. They put in a ton of engineering effort that probably could have been better spent elsewhere if they’d used a different database.


Cassandra won't try and load a partition into memory. It doesn't work that way. The only way you would get behavior like that is by setting "allow filtering" to on. Allow filtering is a dedicated keyword for "I know I shouldn't do this but I'm going to anyway". If you're trying to run those types of queries, use a different database. If someone is making you use a transactional database without joins for analytical load, get a different job because that's a nightmare.

Also, your partitions should never get that large. If you're designing your tables in such away that the partitions grow unbounded, there's an issue. There are lots of ways to ensure that the cardinality of partitions grows as the dataset grows. And you actually control this behavior by managing the partitioning. It's really easy to grok the distribution of data in on disk if you think about how it's keyed.

You've basically listed a bunch of examples of what happens when you don't use a wide columnar store correctly. If you're constantly fixing downed nodes, you're probably running the cluster on hardware from Goodwill.

This is a pretty good list of what not to do with Cassandra, or any similar database. https://blog.softwaremill.com/7-mistakes-when-using-apache-c...


I huge partition is often spread across multiple sstables, and often has tombstone issues if the large partition consists of a large number of column keys or any regular update cycle, which is often the case for hot rows.

In that case the overhead of processing and collecting all the parts of the data you need spread across different sstables and then do tombstones can lead to a lot of memory stress.


what would you recommend now instead ?


Kind of the million dollar question. People like to complain about cassandra, but that just brings up the adage about C++: people complain about the things they use.

But let's not pretend that cassandra isn't almost always a bear. The other problem is that cassandra keeps things up (and never gets the credit for it) but that creates a host of edge cases and management headaches (which makes management hate it).

Most competitors abandon AP for CP (HBase and Cockroach and I think FoundationDB) in order to get joins and SQL, but the BFD on cassandra is the AP design.

Scylla did a C++ rewrite to address tail latency due to JVM GC, but after an explosive release cycle, they basically stalled at partial 2.2 compatiblity. Rocksandra isn't in mainline and doesn't appear to be worked on anymore.

I follow the Jepsen tests a lot: they don't seem to have found a magic solution.

I think Cassandra stopped short of some key OSS deliverables, and I think they could simplify the management as well, both with a UI for admin and with some re-jiggering of how some things work on nodes. The devs are simply swamped with stability and features right now.

And Datastax won't help that much, what admin UI cassandra had was abandoned, and I half think the reason they acquired TLP was that TLP was producing/sponsoring useful admin tooling.

I would love to try something new. What appeals to me about cassandra is the fundamentals of the design, and the fair amount of tranparency there is (although there is still some marketing bullcrap that surrounds it like "CQL is like SQL" and other big lies).

So many other NoSQL's are bolt-on capabilities for handling distribution that Jepsen exposes (MongoDB famously) and have sooo much bullcrap in their claims. All the NoSQLs are desperate for market share, so they all lie about CAP and the edge cases.

Purely distributed databases are VERY HARD and are open to exaggeration, handwaving, and false demonstrations by the salesmen, but those people won't be around when you need a database like this to shine: when the shit hits the fan, you lose an entire datacenter, or similar things.


> Scylla did a C++ rewrite to address tail latency due to JVM GC, but after an explosive release cycle, they basically stalled at partial 2.2 compatiblity

Could you explain this more? Because Scylla has had pretty steady major release updates over the past few year. See the timeline of updates here:

https://www.scylladb.com/2021/08/24/apache-cassandra-4-0-vs-...

We have long since passed C* 3.11 compatibility. In fact, if anything, Scylla, while maintaining a tremendous amount of Cassandra compatibility, now offers better implementations of features present in Cassandra (Materialized Views, Secondary Indexes, Change Data Capture, Lightweight Transactions), plus unique features of its own — incremental compaction and workload prioritization.

But if there's something in particular you're thinking of, I'm open to hear more on how you see it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: