incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saurabh Sehgal <>
Subject Re: data aggregation in Cassandra
Date Fri, 25 Mar 2011 18:38:14 GMT
Thanks for all the responses.

My leading questions then are ->

- Should I go with the OrderPreservingPartitioner based on timestamps so I
can do time range queries - is this recommended ? any special cases
regarding load balancing I need to keep in mind ? I have read buzz over
blogs/forums on how RandomPartitioner yields better load balancing, and it
is discouraged to use OrderPreservingPartitioner. Can someone expand/comment
on this ?

- Also, lets say I query all partitioned data between timestampuuid1 and
timestampuuid2 (over several weeks) .. this would potentially , in my case,
return anywhere to 20 - 30 million records. How would I go about aggregating
this data "by hand" ? Will this perform ?

Since I am only interested in aggregating over a finite set of 10-20
attributes. Does it make more sense to have a column family per finite
attribute ? In this case, I do not need to do any aggregation, since all the
data for that attribute resides in one column family. Is there an upper
bound to the number of column families Cassandra currently supports ?

On Fri, Mar 25, 2011 at 7:31 AM, buddhasystem <> wrote:

> Hello Saurabh,
> I have a similar situation, with a more complex data model, and I do an
> equivalent of map-reduce "by hand". The redeeming value is that you have
> complete freedom in how you hash, and you design the way you store indexes
> and similar structures. If there is a pattern in data store, you use it to
> your advantage. In the end, you get good performance.
> --
> View this message in context:
> Sent from the mailing list archive at

View raw message