Hi all,

  We are still having issues bootstrapping nodes and this becoming quite a blocker for us.  We are seeing the same behavior where bootstrapping the node causes one more existing nodes to hang in GC (see attached screenshot).  Increasing heap and new size has not helped as well as increasing phi to 12.  Email below gives more history.  Any ideas would be VERY welcome!

Thanks

From: Keith Wright <kwright@nanigans.com>
Date: Thursday, October 3, 2013 10:14 AM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Cc: Don Jackson <djackson@nanigans.com>
Subject: Re: Unable to bootstrap new node

Thanks for the response.   We are still having issues bootstrapping a node.  Quick background on where we are at (1.2.8 with Vnodes):
  • We had a node start to complain about corrupted SSTables which we tried to delete one by one but it quickly became a whack-a-mole problem so we decided we would just wipe it and bootstrap
  • We shutdown that node and ran a nodetool removenode on another node
  • We wiped the effected node's data and then attempted to bootstrap it (with the same IP of course)
  • Everytime we attempt to add the node 2 out of the 4 nodes sending data (the same 2 nodes by the way) have streaming failures which I believe is caused by GC (see logging below).  The streaming from these two nodes fails within the first couple minutes of bootstrapping the node.
  • We tried restarting the nodes that failed to stream but the bootstrapping node did not automatically re-attempt the streaming and again we couldn't find a way to force it to
  • We have tried upping the heap and new size on the nodes to help reduce the GC pressure (from our original 10 GB to 14) but no luck and we also decreased stream_throughput_outbound_megabits_per_sec from 400 to 200
  • Eventually the bootstrapping node just hangs as it never gets data from the 2 nodes and there is no way I can find to get it to re-attempt
I'm add a bit of a loss.  Honestly bootstrapping nodes has been a total nightmare for me and makes me very concerned about our ability to fix/grow our cluster as needed.  I hoped Vnodes would help but so far no luck.  Here are the options as I see it:
  • Hope someone here has a great idea on how to fix it :)
  • Assuming I can't get the node to bootstrap, I can start it with bootstrap disabled and trigger a repair.  Is there anyway to ensure it doesn't serve any reads at this time?  I can disable thrift/binary ports but it will still handle requests from other coordinator nodes.  We usually run at read ANY so to ensure we don't miss data we would need to run at QUOROM until the repair completes.
Thanks for the help!

Existing streaming node 1 (10.8.44.98):
ERROR [GossipTasks:1] 2013-10-03 13:09:28,654 AbstractStreamSession.java (line 110) Stream failed because /10.8.44.84 died or was restarted/removed (streams may still be active in background, but further streams won't be started)
ERROR [GossipTasks:1] 2013-10-03 13:09:28,720 AbstractStreamSession.java (line 110) Stream failed because /10.8.44.84 died or was restarted/removed (streams may still be active in background, but further streams won't be started)

Existing streaming node 2 (10.8.44.72):
ERROR [GossipTasks:1] 2013-10-03 13:10:02,174 AbstractStreamSession.java (line 110) Stream failed because /10.8.44.84 died or was restarted/removed (streams may still be active in background, but further streams won't be started)
ERROR [GossipTasks:1] 2013-10-03 13:10:02,185 AbstractStreamSession.java (line 110) Stream failed because /10.8.44.84 died or was restarted/removed (streams may still be active in background, but further streams won't be started)
ERROR [ReplicateOnWriteStage:38] 2013-10-03 13:10:02,265 FailureDetector.java (line 154) unknown endpoint /10.8.44.84
ERROR [ReplicateOnWriteStage:36] 2013-10-03 13:10:02,302 FailureDetector.java (line 154) unknown endpoint /10.8.44.84
ERROR [Native-Transport-Requests:151] 2013-10-03 13:10:02,282 FailureDetector.java (line 154) unknown endpoint /10.8.44.84
ERROR [ReplicateOnWriteStage:37] 2013-10-03 13:10:02,318 FailureDetector.java (line 154) unknown endpoint /10.8.44.84

Bootstrapping node (10.8.44.84):
ERROR [GossipTasks:1] 2013-10-03 13:09:23,196 AbstractStreamSession.java (line 110) Stream failed because /10.8.44.98 died or was restarted/removed (streams may still be active in background, but further streams won't be started)
ERROR [GossipTasks:1] 2013-10-03 13:09:24,199 AbstractStreamSession.java (line 110) Stream failed because /10.8.44.72 died or was restarted/removed (streams may still be active in background, but further streams won't be started)


From: Robert Coli <rcoli@eventbrite.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Wednesday, October 2, 2013 1:55 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Unable to bootstrap new node

On Wed, Oct 2, 2013 at 8:12 AM, Keith Wright <kwright@nanigans.com> wrote:
   We are running C* 1.2.8 with Vnodes enabled and are attempting to bootstrap a new node and are having issues.  When we add the node we see it bootstrap and we see data start to stream over from other nodes however we are seeing one of the other nodes get stuck in full GCs to the point where we had to restart one of the nodes.  I assume this is because building the merkle tree is expensive.

Merkle trees are only involved in "repair", not in normal bootstrap. Have you considered lowering the throttle for streaming? Bootstrap will be slower but should be less likely to overwhelm heap.
 
Any way to force the streaming to restart?   Have others seen this?

In the bootstrap case, you can just wipe the bootstrapping node and re-start the bootstrap.

In the general case regarding hung streaming :


The only solution to hung non-bootstrap streaming is restart all nodes participating in the streaming. With vnodes, this will probably approach 100% of nodes...

=Rob