cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Lundin <...@eintr.org>
Subject Re: Is there any way to store muti-version data based on the timestamp?
Date Wed, 01 Dec 2010 12:56:13 GMT
> I did some test to see if cassandra can store muti-version of the same
> data, but from the below test code seems it only can store one
> version's data, which is different from HBase.
> Can somebody help to confirm this?

Correct. Unlike BigTable and HBase, Cassandra columns don't have a
version dimension.
Timestamp is used for (crude) conflict resolution, and older versions
are always overwritten.

> It will be very appreciative if some one are kindly enough to give me
> a suggestion of how to use cassandra to store muti-version data
> efficiently.

One way is using supercolumns with subcolumns as versions:

 foo => { bar => {v1: data, v2: data, v3: data} ... }

You could also use a standard column family, composing the version
into the column name:

 foo => { bar:v1 => data, bar:v2 => data, bar:v3 => data }

Here, there's a cost on retrieval of course, which may or may not work
depending on your access pattern. If you do large slices, it's
probably not an option. It could be feasible to write a custom
comparator sorting on some version component, to allow efficient
slicing of the "latest" versions.

But first, reach for supercolumns.

/d

Mime
View raw message