cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <>
Subject Re: unbalanced token assignment with random partioner
Date Thu, 20 May 2010 14:32:23 GMT
Yes, if you add nodes when the existing one doesn't have enough data
to guess a good token from the keys it has, it uses a random token.
Created to use
midpoint instead.

On Mon, May 17, 2010 at 4:06 PM, Chris Shorrock <> wrote:
> I have a feeling this issue may be more misunderstanding than anything else,
> but after searching for an explanation in the wiki and elsewhere my
> understanding of token assignments leads me to believe that unbalancing is
> bound to occur.
> Given a relatively simple example if we take a 2 node cassandra setup with a
> random partitioner (letting Cassandra assign the tokens), we end up with a
> ring that looks like:
> Address       Status     Load          Range
>      Ring
> 69518187202527923173412511728767069233
>  Up         1023.44 MB
>  34433420789685454480210475042362028556     |<--|
>  Up         251.16 MB
> 69518187202527923173412511728767069233     |-->|
> Given my understanding of how data works based on the following wiki
> statement:
>> Each Cassandra server [node] is assigned a unique Token that determines
>> what keys it is the first replica for. If you sort all nodes' Tokens, the
>> Range of keys each is responsible for is (PreviousToken, MyToken], that is,
>> from the previous token (exclusive) to the node's token (inclusive). The
>> machine with the lowest Token gets both all keys less than that token, and
>> all keys greater than the largest Token; this is called a "wrapping Range."
> Given this description this implies, in our example above that
> would server keys 0 to 3.4E37 and 6.9E37 to 1.7E38 (the "wrapping Range")
> while servers 3.4E37 to 6.9E37.  Given this it seems that
> would end up serving an uneven amount of data.
> This issue would of course be mitigated as the cluster grows - but it seems
> like the automatic token initial selection of token ranges isn't optimal.
> Is this a configuration issue, a misunderstanding, a new version of math
> I've developed, or?

Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support

View raw message