incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mariusz Dymarek <mdyma...@opera.com>
Subject Re: Tripling size of a cluster
Date Thu, 19 Jul 2012 06:00:19 GMT
Hi again,
we have now moved all nodes to correct position in ring, but we can see 
higher load on 2 nodes, than on other nodes:
...
node01-05   rack1  Up  Normal  244.65 GB  6,67% 
102084710076281539039012382229530463432
node02-13   rack2  Up  Normal  240.26 GB  6,67% 
107756082858297180096735292353393266961
node01-13   rack1  Up  Normal  243.75 GB  6,67% 
113427455640312821154458202477256070485
node02-05   rack2  Up  Normal  249.31 GB  6,67% 
119098828422328462212181112601118874004
node01-14   rack1  Up  Normal  244.95 GB  6,67% 
124770201204344103269904022724981677533
node02-14   rack2  Up  Normal  392.7 GB   6,67% 
130441573986359744327626932848844481058
node01-06   rack1  Up  Normal  249.3 GB   6,67% 
136112946768375385385349842972707284576
node02-15   rack2  Up  Normal  286.82 GB  6,67% 
141784319550391026443072753096570088106
node01-15   rack1  Up  Normal  245.21 GB  6,67% 
147455692332406667500795663220432891630
node02-06   rack2  Up  Normal  244.9 GB   6,67% 
153127065114422308558518573344295695148
...

Node:
* node01-15  = >  286.82 GB
* node02-14  = >  392.7 GB

average load on all other nodes is around 245 GB, nodetool cleanup 
command was invoked on problematic nodes after move operation...
Why this has happen?
And how can we balance cluster?
On 06.07.2012 20:15, aaron morton wrote:
> If you have the time yes I would wait for the bootstrap to finish. It
> will make you life easier.
>
> good luck.
>
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 6/07/2012, at 7:12 PM, Mariusz Dymarek wrote:
>
>> Hi,
>> we`re in the middle of extending our cluster from 10 to 30 nodes,
>> we`re running cassandra 1.1.1...
>> We`ve generated initial tokens for new nodes:
>> "0": 0, # existing: node01-01
>> "1": 5671372782015641057722910123862803524, # new: node02-07
>> "2": 11342745564031282115445820247725607048, # new: node01-07
>> "3": 17014118346046923173168730371588410572, # existing: node02-01
>> "4": 22685491128062564230891640495451214097, # new: node01-08
>> "5": 28356863910078205288614550619314017621, # new: node02-08
>> "6": 34028236692093846346337460743176821145, # existing: node01-02
>> "7": 39699609474109487404060370867039624669, # new: node02-09
>> "8": 45370982256125128461783280990902428194, # new: node01-09
>> "9": 51042355038140769519506191114765231718, # existing: node02-02
>> "10": 56713727820156410577229101238628035242, # new: node01-10
>> "11": 62385100602172051634952011362490838766, # new: node02-10
>> "12": 68056473384187692692674921486353642291, # existing: node01-03
>> "13": 73727846166203333750397831610216445815, # new: node02-11
>> "14": 79399218948218974808120741734079249339, # new: node01-11
>> "15": 85070591730234615865843651857942052864, # existing: node02-03
>> "16": 90741964512250256923566561981804856388, # new: node01-12
>> "17": 96413337294265897981289472105667659912, # new: node02-12
>> "18": 102084710076281539039012382229530463436, # existing: node01-05
>> "19": 107756082858297180096735292353393266961, # new: node02-13
>> "20": 113427455640312821154458202477256070485, # new: node01-13
>> "21": 119098828422328462212181112601118874009, # existing: node02-05
>> "22": 124770201204344103269904022724981677533, # new: node01-14
>> "23": 130441573986359744327626932848844481058, # new: node02-14
>> "24": 136112946768375385385349842972707284582, # existing: node01-06
>> "25": 141784319550391026443072753096570088106, # new: node02-15
>> "26": 147455692332406667500795663220432891630, # new: node01-15
>> "27": 153127065114422308558518573344295695155, # existing: node02-06
>> "28": 158798437896437949616241483468158498679, # new: node01-16
>> "29": 164469810678453590673964393592021302203 # new: node02-16
>> then we`ve started to boostrap new nodes,
>> but due to copy and paste mistake:
>> * node node01-14 was started with
>> 130441573986359744327626932848844481058 as initial token(so node01-14
>> has initial_token, what should belong to node02-14), it
>> should have 124770201204344103269904022724981677533 as initial_token
>> * node node02-14 was started with
>> 136112946768375385385349842972707284582 as initial token, so it has
>> token from existing node01-06....
>>
>> However we`ve used other program for generating previous
>> initial_tokens and actual token of node01-06 in ring is
>> 136112946768375385385349842972707284576.
>> Summing up: we have currently this situation in ring:
>>
>> node02-05 rack2 Up Normal 596.31 GB 6.67%
>> 119098828422328462212181112601118874004
>> node01-14 rack1 Up Joining 242.92 KB 0.00%
>> 130441573986359744327626932848844481058
>> node01-06 rack1 Up Normal 585.5 GB 13.33%
>> 136112946768375385385349842972707284576
>> node02-14 rack2 Up Joining 113.17 KB 0.00%
>> 136112946768375385385349842972707284582
>> node02-15 rack2 Up Joining 178.05 KB 0.00%
>> 141784319550391026443072753096570088106
>> node01-15 rack1 Up Joining 191.7 GB 0.00%
>> 147455692332406667500795663220432891630
>> node02-06 rack2 Up Normal 597.69 GB 20.00%
>> 153127065114422308558518573344295695148
>>
>>
>> We would like to get back to our original configuration.
>> Is it safe to wait for finishing bootstraping of all new nodes and
>> after that invoke:
>> * nodetool -h node01-14 move 124770201204344103269904022724981677533
>> * nodetool -h node02-14 move 130441573986359744327626932848844481058
>> We should probably run nodetool cleanup on several nodes after that...
>> Regards
>> Dymarek Mariusz
>


Mime
View raw message