incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Choosing a Partitioner Type for Random java.util.UUID Row Keys
Date Tue, 20 Dec 2011 19:08:16 GMT
Bryce, 
	Have you considered using CompositeColumns and a standard CF? Row key is the UUID column
name is (timestamp : dir_entry) you can then slice all columns with a particular time stamp.


	Even if you have a random key, I would use the RP unless you have an extreme use case. 

 Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 21/12/2011, at 3:06 AM, Bryce Allen wrote:

> I think it comes down to how much you benefit from row range scans, and
> how confident you are that going forward all data will continue to use
> random row keys.
> 
> I'm considering using BOP as a way of working around the non indexes
> super column limitation. In my current schema, row keys are random
> UUIDs, super column names are timestamps, and columns contain a
> snapshot in time of directory contents, and could be quite large. If
> instead I use row keys that are (uuid)-(timestamp), and use a standard
> column family, I can do a row range query and select only specific
> columns. I'm still evaluating if I can do this with BOP - ideally the
> token would just use the first 128 bits of the key, and I haven't found
> any documentation on how it compares keys of different length.
> 
> Another trick with BOP is to use MD5(rowkey)-rowkey for data that has
> non uniform row keys. I think it's reasonable to use if most data is
> uniform and benefits from range scans, but a few things are added that
> aren't/don't. This trick does make the keys larger, which increases
> storage cost and IO load, so it's probably a bad idea if a significant
> subset of the data requires it.
> 
> Disclaimer - I wrote that wiki article to fill in a documentation gap,
> since there were no examples of BOP and I wasted a lot of time before I
> noticed the hex byte array vs decimal distinction for specifying the
> initial tokens (which to be fair is documented, just easy to miss on a
> skim). I'm also new to cassandra, I'm just describing what makes sense
> to me "on paper". FWIW I confirmed that random UUIDs (type 4) row keys
> really do evenly distribute when using BOP.
> 
> -Bryce
> 
> On Mon, 19 Dec 2011 19:01:00 -0800
> Drew Kutcharian <drew@venarc.com> wrote:
>> Hey Guys,
>> 
>> I just came across
>> http://wiki.apache.org/cassandra/ByteOrderedPartitioner and it got me
>> thinking. If the row keys are java.util.UUID which are generated
>> randomly (and securely), then what type of partitioner would be the
>> best? Since the key values are already random, would it make a
>> difference to use RandomPartitioner or one can use
>> ByteOrderedPartitioner or OrderPreservingPartitioning as well and get
>> the same result?
>> 
>> -- Drew
>> 


Mime
View raw message