incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vanger <>
Subject Adding node to Cassandra
Date Mon, 12 Mar 2012 09:23:13 GMT
*We have cassandra 4 nodes cluster* with RF = 3 (nodes named from 'A' to 
'D', initial tokens:
*A (25%)*: 20543402371996174596346065790779111550, *
B (25%)*: 63454860067234500516210522518260948578,
*C (25%)*: 106715317233367107622067286720208938865,
*D (25%)*: 150141183460469231731687303715884105728),
*and want to add 5th node* ('E') with initial token = 
164163260474281062972548100673162157075,  then we want to rebalance A, 
D, E nodes such way they'll own equal percentage of data. All nodes have 
~400 GB of data and around ~300GB disk free space.
What we did:
1. 'Join' new cassandra instance (node 'E') to cluster and wait 'till it 
loads data for it tokens range.

2. Move node 'D' initial token down from 150... to 130...
Here we ran into a problem. When "move" started disk usage for node C 
grows from 400 to 750GB, we saw running compactions on node 'D' but some 
compactions failed with /"WARN [CompactionExecutor:580] 2012-03-11 
16:57:56,036 (line 87) insufficient space to compact 
all requested files SSTableReader"/ after that we killed "move" process 
to avoid "out of disk space" error (when 5GB of free space left). After 
restart it frees 100GB of space and now we have total of 105GB free disk 
space on node 'D'. Also we noticed increased disk usage by ~150GB at 
node 'B' but it stops growing before we stopped "move token".

So now we have 5 nodes in cluster in status like this:
Node, Owns%,     Load,     Init. token
A:         16%       400GB        020...
B:         25%       520GB        063...
C:         25%       400GB        106...
D:         25%       640GB        150...
E:          9%         300GB        164...

We'll add disk space for all nodes and run some cleanups, but there's 
still left some questions:

What is the best next step  for us from this point?
What is correct procedure after all and what should we expect when 
adding node to cassandra cluster?
We expected decrease of used disk space on node 'D' 'cause we shrink 
token range for this node, but saw the opposite, why it happened and is 
it normal behavior?
What if we'll have 2TB of data on 2.5TB disk and we wanted to add 
another node and move tokens?
Is it possible to automate node addition to cluster and be sure we won't 
run out of space?


View raw message