incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allan C <>
Subject Re: Multiget performance
Date Thu, 10 Apr 2014 23:26:48 GMT
I’m running 1.2.16, so there’s likely some improvement I could get moving to 2.0. Rapid
read protection sounds awesome.

Here’s the trace. Interesting that cassandra reports the query as only taking 25ms in the

I’ve been running the perf tests using pycassa.

Looks like the amount of data returned has a big effect. When I only return one column, python
reports only 20ms compared to 150ms when returning the whole row. Rows are each less than
1k in size, but there must be client overhead.


On April 10, 2014 at 1:27:18 AM, DuyHai Doan ( wrote:

As far  as I understood, the multiget performance is bound to the slowest node responding
to the coordinator.

If you are fetching 100 partitions within n nodes, the coordinator will issue requests to
those nodes and wait until all the responses are given back before returning the results to
the client.

 Consequently if one node among n is under heavy load and takes longer to respond, it will
impact greatly the response time of your multiget.

Now, with the introduction of the recent rapid read protection, this behavior might be mitigated


 Duy Hai DOAN

On Thu, Apr 10, 2014 at 12:52 AM, Tyler Hobbs <> wrote:
Can you trace the query and paste the results?

On Wed, Apr 9, 2014 at 11:17 AM, Allan C <> wrote:
As one CQL statement:

 SELECT * from Event WHERE key IN ([100 keys]);


On April 9, 2014 at 12:52:13 AM, Daniel Chia ( wrote:

Are you making the 100 calls in serial, or in parallel?


On Tue, Apr 8, 2014 at 11:22 PM, Allan C <> wrote:
Hi all,

I’ve always been told that multigets are a Cassandra anti-pattern for performance reasons.
I ran a quick test tonight to prove it to myself, and, sure enough, slowness ensued. It takes
about 150ms to get 100 keys for my use case. Not terrible, but at least an order of magnitude
from what I need it to be.

So far, I’ve been able to denormalize and not have any problems. Today, I ran into a use
case where denormalization introduces a huge amount of complexity to the code.

It’s very tempting to cache a subset in Redis and call it a day — probably will. But,
that’s not a very satisfying answer. It’s only about 5GB of data and it feels like I should
be able to tune a Cassandra CF to be within 2x.

The workload is around 70% reads. Most of the writes are updates to existing data. Currently,
it’s in an LCS CF with ~30M rows. The cluster is 300GB total with 3-way replication, running
across 12 fairly large boxes with 16G RAM. All on SSDs. Striped across 3 AZs in AWS (hi1.4xlarges,

Has anyone had success getting good results for this kind of workload? Or, is Cassandra just
not suited for it at all and I should just use an in-memory store?


Tyler Hobbs

View raw message