Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B624410855 for ; Tue, 30 Apr 2013 10:50:21 +0000 (UTC) Received: (qmail 27383 invoked by uid 500); 30 Apr 2013 10:50:21 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 26507 invoked by uid 500); 30 Apr 2013 10:50:18 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 26086 invoked by uid 99); 30 Apr 2013 10:50:16 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Apr 2013 10:50:16 +0000 Date: Tue, 30 Apr 2013 10:50:15 +0000 (UTC) From: "Richard Low (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-5525) Adding nodes to 1.2 cluster w/ vnodes streamed more data than average node load MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-5525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645445#comment-13645445 ] Richard Low commented on CASSANDRA-5525: ---------------------------------------- Could you attach the output of 'nodetool ring' to list all the tokens? Also what is your replication factor? There is a balancing problem when adding new nodes without running shuffle (or decommissioning and bootstrapping each node). When Cassandra increases the number of tokens from 1 to N (256 in your case), it splits the original ranges into N consecutive ranges. This doesn't change where the data lives but does increase the number of tokens. Cassandra knows that the adjacent tokens are on the same node so doesn't try to store replicas on the same node. It looks for the next range on another node, just like how multi DC replication ensure replicas are in different data centers. Now when a new node is added, it doesn't choose adjacent tokens, it has them spread randomly around the ring. Just one of these small ranges could hold replicas for lots of data, because it becomes the next node in the ring. For high enough replication factor and certain (quite likely) choices of tokens, a new node could end up storing 100% of the data. This could explain what you are seeing, will need to see the token list and RF to confirm. > Adding nodes to 1.2 cluster w/ vnodes streamed more data than average node load > ------------------------------------------------------------------------------- > > Key: CASSANDRA-5525 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5525 > Project: Cassandra > Issue Type: Bug > Reporter: John Watson > Attachments: Screen Shot 2013-04-25 at 12.35.24 PM.png > > > 12 node cluster upgraded from 1.1.9 to 1.2.3, enabled 'num_tokens: 256', restarted and ran upgradesstables and cleanup. > Tried to join 2 additional nodes into the ring. > However, 1 of the new nodes ran out of disk space. This started causing 'no host id' alerts in the live cluster when attempting to store hints for that node. > {noformat} > ERROR 10:12:02,408 Exception in thread Thread[MutationStage:190,5,main] > java.lang.AssertionError: Missing host ID > {noformat} > The other node I killed to stop it from continuing to join. Since the live cluster was now in some sort of broken state dropping mutation messages on 3 nodes. This was fixed by restarting them, however 1 node never stopped, so had to decomm it (leaving the original cluster at 11 nodes.) > Ring pre-join: > {noformat} > Load Tokens Owns (effective) Host ID > 147.55 GB 256 16.7% 754f9f4c-4ba7-4495-97e7-1f5b6755cb27 > 124.99 GB 256 16.7% 93f4400a-09d9-4ca0-b6a6-9bcca2427450 > 136.63 GB 256 16.7% ff821e8e-b2ca-48a9-ac3f-8234b16329ce > 141.78 GB 253 100.0% 339c474f-cf19-4ada-9a47-8b10912d5eb3 > 137.74 GB 256 16.7% 6d726cbf-147d-426e-a735-e14928c95e45 > 135.9 GB 256 16.7% e59a02b3-8b91-4abd-990e-b3cb2a494950 > 165.96 GB 256 16.7% 83ca527c-60c5-4ea0-89a8-de53b92b99c8 > 135.41 GB 256 16.7% c3ea4026-551b-4a14-a346-480e8c1fe283 > 143.38 GB 256 16.7% df7ba879-74ad-400b-b371-91b45dcbed37 > 178.05 GB 256 25.0% 78192d73-be0b-4d49-a129-9bec0770efed > 194.92 GB 256 25.0% 361d7e31-b155-4ce1-8890-451b3ddf46cf > 150.5 GB 256 16.7% 9889280a-1433-439e-bb84-6b7e7f44d761 > {noformat} > Ring after decomm bad node: > {noformat} > Load Tokens Owns (effective) Host ID > 80.95 GB 256 16.7% 754f9f4c-4ba7-4495-97e7-1f5b6755cb27 > 87.15 GB 256 16.7% 93f4400a-09d9-4ca0-b6a6-9bcca2427450 > 98.16 GB 256 16.7% ff821e8e-b2ca-48a9-ac3f-8234b16329ce > 142.6 GB 253 100.0% 339c474f-cf19-4ada-9a47-8b10912d5eb3 > 77.64 GB 256 16.7% e59a02b3-8b91-4abd-990e-b3cb2a494950 > 194.31 GB 256 25.0% 6d726cbf-147d-426e-a735-e14928c95e45 > 221.94 GB 256 33.3% 83ca527c-60c5-4ea0-89a8-de53b92b99c8 > 87.61 GB 256 16.7% c3ea4026-551b-4a14-a346-480e8c1fe283 > 101.02 GB 256 16.7% df7ba879-74ad-400b-b371-91b45dcbed37 > 172.44 GB 256 25.0% 78192d73-be0b-4d49-a129-9bec0770efed > 108.5 GB 256 16.7% 9889280a-1433-439e-bb84-6b7e7f44d761 > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira