incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brandon Williams <dri...@gmail.com>
Subject Re: Storing lots of data as Columns in a Column Family (ref Twissandra)
Date Thu, 18 Mar 2010 21:14:40 GMT
On Thu, Mar 18, 2010 at 4:08 PM, Muhammed Nasrullah <nasrullah@gmail.com>wrote:

> Hello folks,
>
> Twissandra <http://twissandra.com/> (Twitter clone example for Cassandra)
> has a public page where every public update/tweet is stored in a column
> family under the key !public! like so:
>
> Userline = {
>     '!public!': {
>         # timestamp of tweet: tweet id
>         1267414247561777: '7561a442-24e2-11df-8924-001ff3591711',
>         1267414277402340: 'f0c8d718-24e2-11df-8924-001ff3591711',
>         1267414305866969: 'f9e6d804-24e2-11df-8924-001ff3591711',
>         1267414319522925: '02ccb5ec-24e3-11df-8924-001ff3591711',
>     },
> }
>
>
> My question is, because this is the public timeline, it will get a lot of
> updates and because this is a single row keyed by '!public!', this won't fit
> in memory eventually. Is there a better way to model this? The problem is
> that the data needs to be retrieved in reverse chronological order,
> something which cannot be done while getting a range of keys without knowing
> the start and finish keys in advance.


The rows could be named and partitioned by date/time, which can be known in
advance.  For example, '!public!20100318' could contain the public timeline
for that day.

-Brandon

Mime
View raw message