incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guillermo Winkler <gwink...@inconcertcc.com>
Subject Re: What's the best modeling approach for ordering events by date?
Date Fri, 15 Apr 2011 00:00:59 GMT
Hi Ethan,

I want to present the events ordered by time, always in pages of 20/40
events. If the events are tweets, you can have 1000 tweets from the same
second or you can have 30 tweets in a 10 minute range. But I always wanna be
able to page through the results in an orderly fashion.

I think that using seconds since epoch it's what I'm doing, that is divide
time into a fixed series of interval. Each second is an interval, and all of
the events for that particular second are columns of that row.

Again with tweets for easier visualizatoin

TweetsBySecond : {
 12121121212 :{ -> seconds since epoch
 id1,id2,id3 -> all the tweet ids ocurred in that particular second
},
12121212123 : {
id4,id5
},
12121212124 : {
id6
}
}

The problem is you can't do that using OPP in cassandra 0.7, or it's just me
missing something?

Thanks for your answer,
Guille

On Thu, Apr 14, 2011 at 4:49 PM, Ethan Rowe <ethan@the-rowes.com> wrote:

> How do you plan to read the data?  Entire histories, or in relatively
> confined slices of time?  Do the events have any attributes by which you
> might segregate them, apart from time?
>
> If you can divide time into a fixed series of intervals, you can insert
> members of a given interval as columns (or supercolumns) in a row.  But it
> depends how you want to use the data on the read side.
>
>
> On Thu, Apr 14, 2011 at 12:25 PM, Guillermo Winkler <
> gwinkler@inconcertcc.com> wrote:
>
>> I have a huge number of events I need to consume later, ordered by the
>> date the event occured.
>>
>> My first approach to this problem was to use seconds since epoch as row
>> key, and event ids as column names (empty value), this way:
>>
>> EventsByDate : {
>>     SecondsSinceEpoch: {
>>         evid:"", evid:"", evid:""
>>     }
>> }
>>
>> And use OPP as partitioner. Using GetRangeSlices to retrieve ordered
>> events secuentially.
>>
>> Now I have two problems to solve:
>>
>> 1) The system is realtime, so all the events in a given moment are hitting
>> the same box
>> 2) Migrating from cassandra 0.6 to cassandra 0.7 OPP doesn't seem to like
>> LongType for row keys, was this purposedly deprecated?
>>
>> I was thinking about secondary indexes, but it does not assure the order
>> the rows are coming out of cassandra.
>>
>> Anyone has a better approach to model events by date given that
>> restrictions?
>>
>> Thanks,
>> Guille
>>
>>
>>
>



Mime
View raw message