cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: revisioned data
Date Sun, 06 Feb 2011 04:51:00 GMT
Using supercolumns to contain versions is reasonable, as long as the
number of versions is not too large.

On Sat, Feb 5, 2011 at 4:38 PM, Victor Kabdebon
<victor.kabdebon@gmail.com> wrote:
> Hello Raj,
>
> No it actually doesn't make sense from the point of view of Cassandra;
> OrderingPartioner preserves the order of the keys. The Ordering will be done
> according to the supercolumn name. In that case you can set the ordering
> with compare_super_with (sorry I don't remember exactly the new term in
> Cassandra, but that's the idea). The compare_with will order your columns
> inside your supercolumn.
>
> However, and I think that many will agree here, tend to avoid SuperColumn.
> Rather than using SuperColumns try to think like that :
>
> CF1 : "ObjectStore"
> Key :ID (long)
> Columns : {
>     name
>     other fields
>     update time (long [date])
>     ...}
>
> CF2 : "ObjectOrder"
> Key : "myorderedobjects
> Column:{
>    { name : identifier that can be sorted
>    value :ObjectID},
>    ...
> }
>
> Best regards,
> Victor Kabdebon,
> http://www.voxnucleus.fr
>
> 2011/2/5 Raj Bakhru <rbakhru@gmail.com>
>>
>> Hi all -
>>
>> We're new to Cassandra and have read plenty on the data model, but we
>> wanted to poll for thoughts on how to best handle this structure.
>>
>> We have simple objects that have and ID and we want to maintain a history
>> of all the revisions.
>>
>> e.g.
>> MyObject:
>>     ID (long)
>>     name
>>     other fields
>>     update time (long [date])
>>
>>
>> Any time the object changes, we'll store down a new version of the object
>> (same ID, but different update time and other fields).  We need to be able
>> to query out what the object was as-of any time historically.  We also need
>> to be able to query out what some or all of the items of this object type
>> were as-of any time historically..
>>
>> In SQL, we'd just find the max(id) where update time < queried_as_of_time
>>
>> In Cassandra, we were thinking of modeling as follows:
>>
>> CF:  MyObjectType
>> Super-Column: ID of object (e.g. 625)
>> Column:  updatetime  (e.g. "1000245242")
>> Value: byte[] of serialized object
>>
>> We were thinking of using the OrderingPartitioner and using range queries
>> against the data.
>>
>> Does this make sense?  Are we approaching this in the wrong way?
>>
>> Thanks a lot
>>
>>
>>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Mime
View raw message