Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Tue, 30 Apr 2013 10:50:15 +0000 (UTC)
From: "Richard Low (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: <JIRA.12645337.1367277611379.247640.1367319015945@arcas>
In-Reply-To: <JIRA.12645337.1367277611379@arcas>
References: <JIRA.12645337.1367277611379@arcas>
Subject: [jira] [Commented] (CASSANDRA-5525) Adding nodes to 1.2 cluster w/
 vnodes streamed more data than average node load
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/CASSANDRA-5525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645445#comment-13645445 ] 

Richard Low commented on CASSANDRA-5525:
----------------------------------------

Could you attach the output of 'nodetool ring' to list all the tokens?  Also what is your replication factor?

There is a balancing problem when adding new nodes without running shuffle (or decommissioning and bootstrapping each node).  When Cassandra increases the number of tokens from 1 to N (256 in your case), it splits the original ranges into N consecutive ranges.  This doesn't change where the data lives but does increase the number of tokens.

Cassandra knows that the adjacent tokens are on the same node so doesn't try to store replicas on the same node.  It looks for the next range on another node, just like how multi DC replication ensure replicas are in different data centers.

Now when a new node is added, it doesn't choose adjacent tokens, it has them spread randomly around the ring.  Just one of these small ranges could hold replicas for lots of data, because it becomes the next node in the ring.  For high enough replication factor and certain (quite likely) choices of tokens, a new node could end up storing 100% of the data.  This could explain what you are seeing, will need to see the token list and RF to confirm.
                
> Adding nodes to 1.2 cluster w/ vnodes streamed more data than average node load
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-5525
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5525
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: John Watson
>         Attachments: Screen Shot 2013-04-25 at 12.35.24 PM.png
>
>
> 12 node cluster upgraded from 1.1.9 to 1.2.3, enabled 'num_tokens: 256', restarted and ran upgradesstables and cleanup.
> Tried to join 2 additional nodes into the ring.
> However, 1 of the new nodes ran out of disk space. This started causing 'no host id' alerts in the live cluster when attempting to store hints for that node.
> {noformat}
> ERROR 10:12:02,408 Exception in thread Thread[MutationStage:190,5,main]
> java.lang.AssertionError: Missing host ID 
> {noformat}
> The other node I killed to stop it from continuing to join. Since the live cluster was now in some sort of broken state dropping mutation messages on 3 nodes. This was fixed by restarting them, however 1 node never stopped, so had to decomm it (leaving the original cluster at 11 nodes.)
> Ring pre-join:
> {noformat}
> Load       Tokens  Owns (effective)  Host ID                             
> 147.55 GB  256     16.7%             754f9f4c-4ba7-4495-97e7-1f5b6755cb27
> 124.99 GB  256     16.7%             93f4400a-09d9-4ca0-b6a6-9bcca2427450
> 136.63 GB  256     16.7%             ff821e8e-b2ca-48a9-ac3f-8234b16329ce
> 141.78 GB  253     100.0%            339c474f-cf19-4ada-9a47-8b10912d5eb3
> 137.74 GB  256     16.7%             6d726cbf-147d-426e-a735-e14928c95e45
> 135.9 GB   256     16.7%             e59a02b3-8b91-4abd-990e-b3cb2a494950
> 165.96 GB  256     16.7%             83ca527c-60c5-4ea0-89a8-de53b92b99c8
> 135.41 GB  256     16.7%             c3ea4026-551b-4a14-a346-480e8c1fe283
> 143.38 GB  256     16.7%             df7ba879-74ad-400b-b371-91b45dcbed37
> 178.05 GB  256     25.0%             78192d73-be0b-4d49-a129-9bec0770efed
> 194.92 GB  256     25.0%             361d7e31-b155-4ce1-8890-451b3ddf46cf
> 150.5 GB   256     16.7%             9889280a-1433-439e-bb84-6b7e7f44d761
> {noformat}
> Ring after decomm bad node:
> {noformat}
> Load       Tokens  Owns (effective)  Host ID
> 80.95 GB   256     16.7%             754f9f4c-4ba7-4495-97e7-1f5b6755cb27
> 87.15 GB   256     16.7%             93f4400a-09d9-4ca0-b6a6-9bcca2427450
> 98.16 GB   256     16.7%             ff821e8e-b2ca-48a9-ac3f-8234b16329ce
> 142.6 GB   253     100.0%            339c474f-cf19-4ada-9a47-8b10912d5eb3
> 77.64 GB   256     16.7%             e59a02b3-8b91-4abd-990e-b3cb2a494950
> 194.31 GB  256     25.0%             6d726cbf-147d-426e-a735-e14928c95e45
> 221.94 GB  256     33.3%             83ca527c-60c5-4ea0-89a8-de53b92b99c8
> 87.61 GB   256     16.7%             c3ea4026-551b-4a14-a346-480e8c1fe283
> 101.02 GB  256     16.7%             df7ba879-74ad-400b-b371-91b45dcbed37
> 172.44 GB  256     25.0%             78192d73-be0b-4d49-a129-9bec0770efed
> 108.5 GB   256     16.7%             9889280a-1433-439e-bb84-6b7e7f44d761
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira