incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saurabh Sehgal <saurabh....@gmail.com>
Subject help modeling a requirement in cassandra
Date Sat, 26 Mar 2011 02:00:22 GMT
I had another question that ties in with my requirement.

How efficient is it to move data from one column family to another column
family ?

Basically, what I want to do is keep track of how "old" a certain data point
is. I have one column family that maintain those data points, and I want to
be able to to define logical groupings for those data points as:

today
1 day old data
2 day old data
3 day old data
4 day and older data

Is there an efficient way to accomplish all of this and maintain a window
that rolls up like that ? A batch operation that after one day has elapsed,
moves all data from "today" to "1 day old" and all data from "1 day old" to
"2 day old" and so on and so forth ?

I know this is possible since cloudkick is did something similar  ->
https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra/.

However, the "rollup" process itself is something that I am not sure on how
to achieve. Any suggestions ? Any input is greatly appreciated.

Thank you,

Saurabh













On Fri, Mar 25, 2011 at 11:38 AM, Saurabh Sehgal <saurabh.r.s@gmail.com>wrote:

> Thanks for all the responses.
>
> My leading questions then are ->
>
> - Should I go with the OrderPreservingPartitioner based on timestamps so I
> can do time range queries - is this recommended ? any special cases
> regarding load balancing I need to keep in mind ? I have read buzz over
> blogs/forums on how RandomPartitioner yields better load balancing, and it
> is discouraged to use OrderPreservingPartitioner. Can someone expand/comment
> on this ?
>
> - Also, lets say I query all partitioned data between timestampuuid1 and
> timestampuuid2 (over several weeks) .. this would potentially , in my case,
> return anywhere to 20 - 30 million records. How would I go about aggregating
> this data "by hand" ? Will this perform ?
>
> Since I am only interested in aggregating over a finite set of 10-20
> attributes. Does it make more sense to have a column family per finite
> attribute ? In this case, I do not need to do any aggregation, since all the
> data for that attribute resides in one column family. Is there an upper
> bound to the number of column families Cassandra currently supports ?
>
>
>
> On Fri, Mar 25, 2011 at 7:31 AM, buddhasystem <potekhin@bnl.gov> wrote:
>
>> Hello Saurabh,
>>
>> I have a similar situation, with a more complex data model, and I do an
>> equivalent of map-reduce "by hand". The redeeming value is that you have
>> complete freedom in how you hash, and you design the way you store indexes
>> and similar structures. If there is a pattern in data store, you use it to
>> your advantage. In the end, you get good performance.
>>
>> --
>> View this message in context:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/data-aggregation-in-Cassandra-tp6206994p6207879.html
>> Sent from the cassandra-user@incubator.apache.org mailing list archive at
>> Nabble.com.
>
>

Mime
View raw message