cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bingbing Liu" <rucb...@gmail.com>
Subject Re: Re: when i use the OrderPreservingPartition, the load is veryimbalance
Date Mon, 26 Apr 2010 10:37:20 GMT
thank you so much for your help!


2010-04-26 



Bingbing Liu 



发件人: Mark Robson 
发送时间: 2010-04-26  18:17:53 
收件人: user 
抄送: 
主题: Re: when i use the OrderPreservingPartition, the load is veryimbalance 
 
On 26 April 2010 01:18, 刘兵兵 <rucbing@gmail.com> wrote:

i do some INSERT ,because i will do some scan operations, i use the OrderPreservingPartition
method.

the state of the cluster is showed below.

as i predicated the load is very imbalance




I think the solution to this would be to choose your nodes' tokens wisely before you start
inserting data, and if possible, modify the keys to split them better between the nodes.


For example, if your key has two parts, one of which you want to range scan, another which
you don't. Say you have customer_id and a timestamp. The customer ID does not need to be range
scanned, so you can hash it into a hex value (say), then append the timestamp (in a lexically
sortable way of course). So you'd end up with keys like 


HHHH-0012345-0001234567890


Where HHHH is a hash of the customer ID, 0012345 is the customer ID, and the rest is a timestamp.


You'd be able to do a time range scan by using the known prefixes, and distributing your nodes
equally from 0000 to ffff would result in fairly even data (provided you don't have a very
small number of very large customers).


Mark
Mime
View raw message