I’ve always been told that multigets are a Cassandra anti-pattern for performance reasons. I ran a quick test tonight to prove it to myself, and, sure enough, slowness ensued. It takes about 150ms to get 100 keys for my use case. Not terrible, but at least an order of magnitude from what I need it to be.
So far, I’ve been able to denormalize and not have any problems. Today, I ran into a use case where denormalization introduces a huge amount of complexity to the code.
It’s very tempting to cache a subset in Redis and call it a day — probably will. But, that’s not a very satisfying answer. It’s only about 5GB of data and it feels like I should be able to tune a Cassandra CF to be within 2x.
The workload is around 70% reads. Most of the writes are updates to existing data. Currently, it’s in an LCS CF with ~30M rows. The cluster is 300GB total with 3-way replication, running across 12 fairly large boxes with 16G RAM. All on SSDs. Striped across 3 AZs in AWS (hi1.4xlarges, fwiw).
Has anyone had success getting good results for this kind of workload? Or, is Cassandra just not suited for it at all and I should just use an in-memory store?