cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Shorrock <>
Subject unbalanced token assignment with random partioner
Date Mon, 17 May 2010 23:06:25 GMT
I have a feeling this issue may be more misunderstanding than anything else,
but after searching for an explanation in the wiki and elsewhere my
understanding of token assignments leads me to believe that unbalancing is
bound to occur.

Given a relatively simple example if we take a 2 node cassandra setup with a
random partitioner (letting Cassandra assign the tokens), we end up with a
ring that looks like:

Address       Status     Load          Range

69518187202527923173412511728767069233  Up         1023.44 MB
 34433420789685454480210475042362028556     |<--|  Up         251.16 MB
69518187202527923173412511728767069233     |-->|

Given my understanding of how data works based on the following wiki

*Each Cassandra server [node] is assigned a unique Token that determines
> what keys it is the first replica for. If you sort all nodes' Tokens, the
> Range of keys each is responsible for is (PreviousToken, MyToken], that is,
> from the previous token (exclusive) to the node's token (inclusive). The
> machine with the lowest Token gets both all keys less than that token, and
> all keys greater than the largest Token; this is called a "wrapping Range."
> *

Given this description this implies, in our example above that
would server keys 0 to 3.4E37 and 6.9E37 to 1.7E38 (the "wrapping Range")
while servers 3.4E37 to 6.9E37.  Given this it seems that would end up serving an uneven amount of data.

This issue would of course be mitigated as the cluster grows - but it seems
like the automatic token initial selection of token ranges isn't optimal.

Is this a configuration issue, a misunderstanding, a new version of math
I've developed, or?

View raw message