incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryce Allen <bal...@ci.uchicago.edu>
Subject Re: Choosing a Partitioner Type for Random java.util.UUID Row Keys
Date Tue, 20 Dec 2011 14:06:09 GMT
I think it comes down to how much you benefit from row range scans, and
how confident you are that going forward all data will continue to use
random row keys.

I'm considering using BOP as a way of working around the non indexes
super column limitation. In my current schema, row keys are random
UUIDs, super column names are timestamps, and columns contain a
snapshot in time of directory contents, and could be quite large. If
instead I use row keys that are (uuid)-(timestamp), and use a standard
column family, I can do a row range query and select only specific
columns. I'm still evaluating if I can do this with BOP - ideally the
token would just use the first 128 bits of the key, and I haven't found
any documentation on how it compares keys of different length.

Another trick with BOP is to use MD5(rowkey)-rowkey for data that has
non uniform row keys. I think it's reasonable to use if most data is
uniform and benefits from range scans, but a few things are added that
aren't/don't. This trick does make the keys larger, which increases
storage cost and IO load, so it's probably a bad idea if a significant
subset of the data requires it.

Disclaimer - I wrote that wiki article to fill in a documentation gap,
since there were no examples of BOP and I wasted a lot of time before I
noticed the hex byte array vs decimal distinction for specifying the
initial tokens (which to be fair is documented, just easy to miss on a
skim). I'm also new to cassandra, I'm just describing what makes sense
to me "on paper". FWIW I confirmed that random UUIDs (type 4) row keys
really do evenly distribute when using BOP.

-Bryce

On Mon, 19 Dec 2011 19:01:00 -0800
Drew Kutcharian <drew@venarc.com> wrote:
> Hey Guys,
> 
> I just came across
> http://wiki.apache.org/cassandra/ByteOrderedPartitioner and it got me
> thinking. If the row keys are java.util.UUID which are generated
> randomly (and securely), then what type of partitioner would be the
> best? Since the key values are already random, would it make a
> difference to use RandomPartitioner or one can use
> ByteOrderedPartitioner or OrderPreservingPartitioning as well and get
> the same result?
> 
> -- Drew
> 

Mime
View raw message