cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Coverston <>
Subject Re: fixing unbalanced cluster !?
Date Thu, 09 Jun 2011 14:44:31 GMT
Because you were able to successfully run repair you can follow up with 
a nodetool cleanup which will git rid of some of the extraneous data on 
that (bigger) node. You're also assured after you run repair that 
entropy beteen the nodes is minimal.

Assuming you're using the random ordered partitioner: To balance your 
ring I would start by calculating the new token locations, then moving 
each of your nodes backwards along their owned range to their new locations.

 From the script on your new 
balanced tokens would be:


 From this you can see that  10.46.108.{100, 101} is already in the 
right place so you don't have to do anything with those nodes. Proceed 
with moving to its new token, the safest way to do this 
would be to use nodetool move. Another way to do it could be to run a 
remove-token followed by re-adding the node into the ring at its new 
location. The risk here is that if you do not at least repair after 
re-joining the ring (and before you move the next node in the ring) then 
some of the data on that node would be ignored as it would now fall out 
of the owned range, so it's good practice to immediately run repair on a 
node that you do a removetoken / re-join on.

The rest of your balancing should be an iteration on the above steps 
moving through the range.

On 6/9/11 6:21 AM, Jonathan Colby wrote:
> I got myself into a situation where one node ( has a lot more data than
the other nodes.   In fact, the 1 TB disk on this node is almost full.  I added 3 new nodes
and let cassandra automatically calculate new tokens by taking the highest loaded nodes. 
Unfortunately there is still a big token range this  node is responsible for (5113... -  85070...).
 Yes, I know that one option would be to rebalance the entire cluster with move but this is
an extremely time-consuming and error-prone process because of the amount of data involved.
> Our RF = 3 and we read/write quorum.   The nodes have been repaired so I think the data
should be in good shape.
> Question:    Can I get myself out of this mess without installing new nodes?    I was
thinking of either decommission or removetoken to have the cluster "rebalance itself".  The
re-bootstrap this node to a new token.
> Address         Status State   Load            Owns    Token
>                                                         127605887595351923798765477786913079296
>   Up     Normal  218.52 GB       25.00%  0
>   Up     Normal  260.04 GB       12.50%  21267647932558653966460912964485513216
>   Up     Normal  286.79 GB       17.56%  51138582157040063602728874106478613120
>   Up     Normal  874.91 GB       19.94%  85070591730234615865843651857942052863
>   Up     Normal  302.79 GB       4.16%   92156241323118845370666296304459139297
>   Up     Normal  242.02 GB       4.16%   99241191538897700272878550821956884116
>   Up     Normal  439.9 GB        8.34%   113427455640312821154458202477256070484
>   Up     Normal  304 GB          8.33%   127605887595351923798765477786913079296

Ben Coverston
Director of Operations
DataStax -- The Apache Cassandra Company

View raw message