Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F3EC89BF2 for ; Fri, 20 Jul 2012 09:03:03 +0000 (UTC) Received: (qmail 12022 invoked by uid 500); 20 Jul 2012 09:03:01 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 11932 invoked by uid 500); 20 Jul 2012 09:03:01 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 11907 invoked by uid 99); 20 Jul 2012 09:03:00 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Jul 2012 09:03:00 +0000 X-ASF-Spam-Status: No, hits=3.3 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,TRACKER_ID X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a45.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Jul 2012 09:02:52 +0000 Received: from homiemail-a45.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a45.g.dreamhost.com (Postfix) with ESMTP id 91CFD480B9 for ; Fri, 20 Jul 2012 02:02:30 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=u3ktSB9GEt ukNvjtoyAI+Nhclr+0RBPo1t9ybNEuNO37zzSbw3iEQWAP/uGcd8KnG7cQwWiohD 7zsXzkPIRksMClP5+WYKeqs5dCvkYb6lO/cajFpWp/UfjJlqjzmusUWXYWOUfBfr 96kgKVE5kyGJ+r+i9Km58DvOZHQS8Ua9k= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=PN1XDtwu3H6DxWkW cRoiDtdeYrE=; b=xSV2154OoqDbLUee0hghzP/W9ntimEuZ2BKpd3cFHm9APNfr kHnyCsMXKqoD+jZXXbUe5YuPNr6u2TaKQkhSrs6Jp/DniP9ARB9QGAxtuXZUxWkE 5rAcDcBOE37GOWm04ZUTw9WehnOMpUa1miXP62EpVElLKGlPAHcby8by7nk= Received: from [172.16.1.10] (unknown [203.86.207.101]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a45.g.dreamhost.com (Postfix) with ESMTPSA id C1B56480C9 for ; Fri, 20 Jul 2012 02:02:29 -0700 (PDT) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1278) Content-Type: multipart/alternative; boundary="Apple-Mail=_D9B7425D-73D5-48B6-B8C1-0511C685E392" Subject: Re: Tripling size of a cluster Date: Fri, 20 Jul 2012 21:02:27 +1200 In-Reply-To: <5007A273.3090208@opera.com> To: user@cassandra.apache.org References: <4FF68FF8.40105@opera.com> <4158C111-DA9A-4D5E-AB92-FA465B6459CC@thelastpickle.com> <5007A273.3090208@opera.com> Message-Id: <1C4E6EA4-147C-41FC-9A01-1368172FBDDE@thelastpickle.com> X-Mailer: Apple Mail (2.1278) --Apple-Mail=_D9B7425D-73D5-48B6-B8C1-0511C685E392 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 I would check for stored hints in /var/lib/cassandra/data/system Putting nodes in different racks can make placement tricky so=85 Are you running a multi DC setup ? Are you using the NTS ? What is the = RF setting ? What setting do you have for the Snitch ? What is the full = node assignments.=20 Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 19/07/2012, at 6:00 PM, Mariusz Dymarek wrote: > Hi again, > we have now moved all nodes to correct position in ring, but we can = see higher load on 2 nodes, than on other nodes: > ... > node01-05 rack1 Up Normal 244.65 GB 6,67% = 102084710076281539039012382229530463432 > node02-13 rack2 Up Normal 240.26 GB 6,67% = 107756082858297180096735292353393266961 > node01-13 rack1 Up Normal 243.75 GB 6,67% = 113427455640312821154458202477256070485 > node02-05 rack2 Up Normal 249.31 GB 6,67% = 119098828422328462212181112601118874004 > node01-14 rack1 Up Normal 244.95 GB 6,67% = 124770201204344103269904022724981677533 > node02-14 rack2 Up Normal 392.7 GB 6,67% = 130441573986359744327626932848844481058 > node01-06 rack1 Up Normal 249.3 GB 6,67% = 136112946768375385385349842972707284576 > node02-15 rack2 Up Normal 286.82 GB 6,67% = 141784319550391026443072753096570088106 > node01-15 rack1 Up Normal 245.21 GB 6,67% = 147455692332406667500795663220432891630 > node02-06 rack2 Up Normal 244.9 GB 6,67% = 153127065114422308558518573344295695148 > ... >=20 > Node: > * node01-15 =3D > 286.82 GB > * node02-14 =3D > 392.7 GB >=20 > average load on all other nodes is around 245 GB, nodetool cleanup = command was invoked on problematic nodes after move operation... > Why this has happen? > And how can we balance cluster? > On 06.07.2012 20:15, aaron morton wrote: >> If you have the time yes I would wait for the bootstrap to finish. It >> will make you life easier. >>=20 >> good luck. >>=20 >>=20 >> ----------------- >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >>=20 >> On 6/07/2012, at 7:12 PM, Mariusz Dymarek wrote: >>=20 >>> Hi, >>> we`re in the middle of extending our cluster from 10 to 30 nodes, >>> we`re running cassandra 1.1.1... >>> We`ve generated initial tokens for new nodes: >>> "0": 0, # existing: node01-01 >>> "1": 5671372782015641057722910123862803524, # new: node02-07 >>> "2": 11342745564031282115445820247725607048, # new: node01-07 >>> "3": 17014118346046923173168730371588410572, # existing: node02-01 >>> "4": 22685491128062564230891640495451214097, # new: node01-08 >>> "5": 28356863910078205288614550619314017621, # new: node02-08 >>> "6": 34028236692093846346337460743176821145, # existing: node01-02 >>> "7": 39699609474109487404060370867039624669, # new: node02-09 >>> "8": 45370982256125128461783280990902428194, # new: node01-09 >>> "9": 51042355038140769519506191114765231718, # existing: node02-02 >>> "10": 56713727820156410577229101238628035242, # new: node01-10 >>> "11": 62385100602172051634952011362490838766, # new: node02-10 >>> "12": 68056473384187692692674921486353642291, # existing: node01-03 >>> "13": 73727846166203333750397831610216445815, # new: node02-11 >>> "14": 79399218948218974808120741734079249339, # new: node01-11 >>> "15": 85070591730234615865843651857942052864, # existing: node02-03 >>> "16": 90741964512250256923566561981804856388, # new: node01-12 >>> "17": 96413337294265897981289472105667659912, # new: node02-12 >>> "18": 102084710076281539039012382229530463436, # existing: node01-05 >>> "19": 107756082858297180096735292353393266961, # new: node02-13 >>> "20": 113427455640312821154458202477256070485, # new: node01-13 >>> "21": 119098828422328462212181112601118874009, # existing: node02-05 >>> "22": 124770201204344103269904022724981677533, # new: node01-14 >>> "23": 130441573986359744327626932848844481058, # new: node02-14 >>> "24": 136112946768375385385349842972707284582, # existing: node01-06 >>> "25": 141784319550391026443072753096570088106, # new: node02-15 >>> "26": 147455692332406667500795663220432891630, # new: node01-15 >>> "27": 153127065114422308558518573344295695155, # existing: node02-06 >>> "28": 158798437896437949616241483468158498679, # new: node01-16 >>> "29": 164469810678453590673964393592021302203 # new: node02-16 >>> then we`ve started to boostrap new nodes, >>> but due to copy and paste mistake: >>> * node node01-14 was started with >>> 130441573986359744327626932848844481058 as initial token(so = node01-14 >>> has initial_token, what should belong to node02-14), it >>> should have 124770201204344103269904022724981677533 as initial_token >>> * node node02-14 was started with >>> 136112946768375385385349842972707284582 as initial token, so it has >>> token from existing node01-06.... >>>=20 >>> However we`ve used other program for generating previous >>> initial_tokens and actual token of node01-06 in ring is >>> 136112946768375385385349842972707284576. >>> Summing up: we have currently this situation in ring: >>>=20 >>> node02-05 rack2 Up Normal 596.31 GB 6.67% >>> 119098828422328462212181112601118874004 >>> node01-14 rack1 Up Joining 242.92 KB 0.00% >>> 130441573986359744327626932848844481058 >>> node01-06 rack1 Up Normal 585.5 GB 13.33% >>> 136112946768375385385349842972707284576 >>> node02-14 rack2 Up Joining 113.17 KB 0.00% >>> 136112946768375385385349842972707284582 >>> node02-15 rack2 Up Joining 178.05 KB 0.00% >>> 141784319550391026443072753096570088106 >>> node01-15 rack1 Up Joining 191.7 GB 0.00% >>> 147455692332406667500795663220432891630 >>> node02-06 rack2 Up Normal 597.69 GB 20.00% >>> 153127065114422308558518573344295695148 >>>=20 >>>=20 >>> We would like to get back to our original configuration. >>> Is it safe to wait for finishing bootstraping of all new nodes and >>> after that invoke: >>> * nodetool -h node01-14 move 124770201204344103269904022724981677533 >>> * nodetool -h node02-14 move 130441573986359744327626932848844481058 >>> We should probably run nodetool cleanup on several nodes after = that... >>> Regards >>> Dymarek Mariusz >>=20 >=20 --Apple-Mail=_D9B7425D-73D5-48B6-B8C1-0511C685E392 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252 I = would check for stored hints in = /var/lib/cassandra/data/system

Putting nodes in = different racks can make placement tricky so=85
Are you = running a multi DC setup ? Are you using the NTS ? What is the RF = setting ? What setting do you have for the Snitch ? What is the full = node = assignments. 

Cheers


http://www.thelastpickle.com

On 19/07/2012, at 6:00 PM, Mariusz Dymarek wrote:

Hi = again,
we have now moved all nodes to correct position in ring, but = we can see higher load on 2 nodes, than on other = nodes:
...
node01-05   rack1  Up  Normal =  244.65 GB  6,67% = 102084710076281539039012382229530463432
node02-13   rack2 =  Up  Normal  240.26 GB  6,67% = 107756082858297180096735292353393266961
node01-13   rack1 =  Up  Normal  243.75 GB  6,67% = 113427455640312821154458202477256070485
node02-05   rack2 =  Up  Normal  249.31 GB  6,67% = 119098828422328462212181112601118874004
node01-14   rack1 =  Up  Normal  244.95 GB  6,67% = 124770201204344103269904022724981677533
node02-14   rack2 =  Up  Normal  392.7 GB   6,67% = 130441573986359744327626932848844481058
node01-06   rack1 =  Up  Normal  249.3 GB   6,67% = 136112946768375385385349842972707284576
node02-15   rack2 =  Up  Normal  286.82 GB  6,67% = 141784319550391026443072753096570088106
node01-15   rack1 =  Up  Normal  245.21 GB  6,67% = 147455692332406667500795663220432891630
node02-06   rack2 =  Up  Normal  244.9 GB   6,67% = 153127065114422308558518573344295695148
...

Node:
* = node01-15  =3D >  286.82 GB
* node02-14  =3D > =  392.7 GB

average load on all other nodes is around 245 GB, = nodetool cleanup command was invoked on problematic nodes after move = operation...
Why this has happen?
And how can we balance = cluster?
On 06.07.2012 20:15, aaron morton wrote:
If you have the time yes I would wait for the bootstrap to = finish. It
will make you life = easier.

good = luck.


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
=

On 6/07/2012, at 7:12 PM, Mariusz Dymarek = wrote:

Hi,
we`re in the middle of extending = our cluster from 10 to 30 = nodes,
we`re running cassandra = 1.1.1...
We`ve generated initial tokens = for new nodes:
"0": 0, # existing: = node01-01
"1": = 5671372782015641057722910123862803524, # new: = node02-07
"2": = 11342745564031282115445820247725607048, # new: = node01-07
"3": = 17014118346046923173168730371588410572, # existing: = node02-01
"4": = 22685491128062564230891640495451214097, # new: = node01-08
"5": = 28356863910078205288614550619314017621, # new: = node02-08
"6": = 34028236692093846346337460743176821145, # existing: = node01-02
"7": = 39699609474109487404060370867039624669, # new: = node02-09
"8": = 45370982256125128461783280990902428194, # new: = node01-09
"9": = 51042355038140769519506191114765231718, # existing: = node02-02
"10": = 56713727820156410577229101238628035242, # new: = node01-10
"11": = 62385100602172051634952011362490838766, # new: = node02-10
"12": = 68056473384187692692674921486353642291, # existing: = node01-03
"13": = 73727846166203333750397831610216445815, # new: = node02-11
"14": = 79399218948218974808120741734079249339, # new: = node01-11
"15": = 85070591730234615865843651857942052864, # existing: = node02-03
"16": = 90741964512250256923566561981804856388, # new: = node01-12
"17": = 96413337294265897981289472105667659912, # new: = node02-12
"18": = 102084710076281539039012382229530463436, # existing: = node01-05
"19": = 107756082858297180096735292353393266961, # new: = node02-13
"20": = 113427455640312821154458202477256070485, # new: = node01-13
"21": = 119098828422328462212181112601118874009, # existing: = node02-05
"22": = 124770201204344103269904022724981677533, # new: = node01-14
"23": = 130441573986359744327626932848844481058, # new: = node02-14
"24": = 136112946768375385385349842972707284582, # existing: = node01-06
"25": = 141784319550391026443072753096570088106, # new: = node02-15
"26": = 147455692332406667500795663220432891630, # new: = node01-15
"27": = 153127065114422308558518573344295695155, # existing: = node02-06
"28": = 158798437896437949616241483468158498679, # new: = node01-16
"29": = 164469810678453590673964393592021302203 # new: = node02-16
then we`ve started to boostrap = new nodes,
but due to copy and paste = mistake:
* node node01-14 was started = with
130441573986359744327626932848844481058 as initial = token(so node01-14
has initial_token, what should = belong to node02-14), it
should have = 124770201204344103269904022724981677533 as = initial_token
* node node02-14 was started = with
136112946768375385385349842972707284582 as initial token, = so it has
token from existing = node01-06....

However we`ve used other program = for generating previous
initial_tokens and actual token = of node01-06 in ring is
136112946768375385385349842972707284576.
Summing = up: we have currently this situation in = ring:

node02-05 rack2 Up Normal 596.31 = GB 6.67%
119098828422328462212181112601118874004
node01-14 = rack1 Up Joining 242.92 KB = 0.00%
130441573986359744327626932848844481058
node01-06 = rack1 Up Normal 585.5 GB 13.33%
136112946768375385385349842972707284576
node02-14 = rack2 Up Joining 113.17 KB = 0.00%
136112946768375385385349842972707284582
node02-15 = rack2 Up Joining 178.05 KB = 0.00%
141784319550391026443072753096570088106
node01-15 = rack1 Up Joining 191.7 GB 0.00%
147455692332406667500795663220432891630
node02-06 = rack2 Up Normal 597.69 GB = 20.00%
153127065114422308558518573344295695148


We would like to get back to our = original configuration.
Is it safe to wait for finishing = bootstraping of all new nodes = and
after that = invoke:
* nodetool -h node01-14 move = 124770201204344103269904022724981677533
* nodetool -h node02-14 = move = 130441573986359744327626932848844481058
We should probably run = nodetool cleanup on several nodes after = that...
Regards
Dymarek = Mariusz



= --Apple-Mail=_D9B7425D-73D5-48B6-B8C1-0511C685E392--