incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allan C <alla...@gmail.com>
Subject Re: Multiget performance
Date Wed, 09 Apr 2014 16:17:27 GMT
As one CQL statement:

 SELECT * from Event WHERE key IN ([100 keys]);

-Allan

On April 9, 2014 at 12:52:13 AM, Daniel Chia (danchia@coursera.org) wrote:

Are you making the 100 calls in serial, or in parallel?

Thanks,
Daniel


On Tue, Apr 8, 2014 at 11:22 PM, Allan C <allanca@gmail.com> wrote:
Hi all,

I’ve always been told that multigets are a Cassandra anti-pattern for performance reasons.
I ran a quick test tonight to prove it to myself, and, sure enough, slowness ensued. It takes
about 150ms to get 100 keys for my use case. Not terrible, but at least an order of magnitude
from what I need it to be.

So far, I’ve been able to denormalize and not have any problems. Today, I ran into a use
case where denormalization introduces a huge amount of complexity to the code.

It’s very tempting to cache a subset in Redis and call it a day — probably will. But,
that’s not a very satisfying answer. It’s only about 5GB of data and it feels like I should
be able to tune a Cassandra CF to be within 2x.

The workload is around 70% reads. Most of the writes are updates to existing data. Currently,
it’s in an LCS CF with ~30M rows. The cluster is 300GB total with 3-way replication, running
across 12 fairly large boxes with 16G RAM. All on SSDs. Striped across 3 AZs in AWS (hi1.4xlarges,
fwiw).


Has anyone had success getting good results for this kind of workload? Or, is Cassandra just
not suited for it at all and I should just use an in-memory store?

-Allan


Mime
View raw message