incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Robson <mar...@gmail.com>
Subject Re: when i use the OrderPreservingPartition, the load is very imbalance
Date Mon, 26 Apr 2010 10:17:22 GMT
On 26 April 2010 01:18, 刘兵兵 <rucbing@gmail.com> wrote:

> i do some INSERT ,because i will do some scan operations, i use the
> OrderPreservingPartition method.
>
> the state of the cluster is showed below.
>
> as i predicated the load is very imbalance



I think the solution to this would be to choose your nodes' tokens wisely
before you start inserting data, and if possible, modify the keys to split
them better between the nodes.

For example, if your key has two parts, one of which you want to range scan,
another which you don't. Say you have customer_id and a timestamp. The
customer ID does not need to be range scanned, so you can hash it into a hex
value (say), then append the timestamp (in a lexically sortable way of
course). So you'd end up with keys like

HHHH-0012345-0001234567890

Where HHHH is a hash of the customer ID, 0012345 is the customer ID, and the
rest is a timestamp.

You'd be able to do a time range scan by using the known prefixes, and
distributing your nodes equally from 0000 to ffff would result in fairly
even data (provided you don't have a very small number of very large
customers).

Mark

Mime
View raw message