incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Schubert Zhang <zson...@gmail.com>
Subject Re: when i use the OrderPreservingPartition, the load is very imbalance
Date Mon, 26 Apr 2010 10:27:25 GMT
When starting your cassandra cluster, please configure the InitialToken for
each node, which make the key range balance.

On Mon, Apr 26, 2010 at 6:17 PM, Mark Robson <markxr@gmail.com> wrote:

> On 26 April 2010 01:18, 刘兵兵 <rucbing@gmail.com> wrote:
>
>> i do some INSERT ,because i will do some scan operations, i use the
>> OrderPreservingPartition method.
>>
>> the state of the cluster is showed below.
>>
>> as i predicated the load is very imbalance
>
>
>
> I think the solution to this would be to choose your nodes' tokens wisely
> before you start inserting data, and if possible, modify the keys to split
> them better between the nodes.
>
> For example, if your key has two parts, one of which you want to range
> scan, another which you don't. Say you have customer_id and a timestamp. The
> customer ID does not need to be range scanned, so you can hash it into a hex
> value (say), then append the timestamp (in a lexically sortable way of
> course). So you'd end up with keys like
>
> HHHH-0012345-0001234567890
>
> Where HHHH is a hash of the customer ID, 0012345 is the customer ID, and
> the rest is a timestamp.
>
> You'd be able to do a time range scan by using the known prefixes, and
> distributing your nodes equally from 0000 to ffff would result in fairly
> even data (provided you don't have a very small number of very large
> customers).
>
> Mark
>

Mime
View raw message