incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Hsu <pe...@motivecast.com>
Subject Re: Order Preserving Partitioner
Date Wed, 26 May 2010 17:51:46 GMT
Correct me if I'm wrong here.  Even though you can get your results with Random Partitioner,
it's a lot less efficient if you're going across different machines to get your results. 
If you're doing a lot of range queries, it makes sense to have things ordered sequentially
so that if you do need to go to disk, the reads will be faster, rather than lots of random
reads across your system.

It's also my understanding that if you go with the OPP, you could hash your key yourself using
md5 or sha-1 to effectively get random partitioning.  So it's a bit of a pain, but not impossible
to do a split between OPP and RP for your different columnfamily/keyspaces.

On May 26, 2010, at 2:32 AM, David Boxenhorn wrote:

> Just in case you don't know: You can do range searches on keys even with Random Partitioner,
you just won't get the results in order. If this is good enough for you (e.g. if you can order
the results on the client, or if you just need to get the right answer, but not the right
order), then you should use Random Partitioner. 
> 
> (I bring this up because it confused me until recently.) 
> 
> On Wed, May 26, 2010 at 5:14 AM, Steve Lihn <stevelihn@gmail.com> wrote:
> I have a question on using Order Preserving Partitioner. 
> 
> Many rowKeys in my system will be related to dates, so it seems natural to use Order
Preserving Partitioner instead of the default Random Partitioner. However, I have been warned
that special attention has to be applied for Order Preserving Partitioner to work properly
(basically to ensure a good key distribution and avoid "hot spot") and reverting it back to
Random may not be easy. Also not every rowKey is related to dates, for these, using Random
Partitioner is okay, but there is only one place to set Partitioner.
> 
> (Note: The intension of this warning is actually to discredit Cassandra and persuade
me not to use it.)
> 
> It seems the choice of Partitioner is defined in the storage-conf.xml and is a global
property. My question why does it have to be a global property? Is there a future plan to
make it customizable per KeySpace (just like you would choose hash or range partition for
different table/data in RDBMS) ?  
> 
> Thanks,
> Steve 
> 


Mime
View raw message