cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Jongsma <jer...@barchart.com>
Subject Re: Large number of row keys in query kills cluster
Date Tue, 10 Jun 2014 23:15:09 GMT
I didn't explain clearly - I'm not requesting 20000 unknown keys (resulting
in a full scan), I'm requesting 20000 specific rows by key.
On Jun 10, 2014 6:02 PM, "DuyHai Doan" <doanduyhai@gmail.com> wrote:

> Hello Jeremy
>
> Basically what you are doing is to ask Cassandra to do a distributed full
> scan on all the partitions across the cluster, it's normal that the nodes
> are somehow.... stressed.
>
> How did you make the query? Are you using Thrift or CQL3 API?
>
> Please note that there is another way to get all partition keys : SELECT
> DISTINCT <partition_key> FROM..., more details here :
> www.datastax.com/dev/blog/cassandra-2-0-1-2-0-2-and-a-quick-peek-at-2-0-3
> I ran an application today that attempted to fetch 20,000+ unique row keys
> in one query against a set of completely empty column families. On a 4-node
> cluster (EC2 m1.large instances) with the recommended memory settings (2 GB
> heap), every single node immediately ran out of memory and became
> unresponsive, to the point where I had to kill -9 the cassandra processes.
>
> Now clearly this query is not the best idea in the world, but the effects
> of it are a bit disturbing. What could be going on here? Are there any
> other query pitfalls I should be aware of that have the potential to
> explode the entire cluster?
>
> -j
>

Mime
View raw message