Thanks for the response.   We are still having issues bootstrapping a node.  Quick background on where we are at (1.2.8 with Vnodes):
I'm add a bit of a loss.  Honestly bootstrapping nodes has been a total nightmare for me and makes me very concerned about our ability to fix/grow our cluster as needed.  I hoped Vnodes would help but so far no luck.  Here are the options as I see it:
Thanks for the help!

Existing streaming node 1 (10.8.44.98):
ERROR [GossipTasks:1] 2013-10-03 13:09:28,654 AbstractStreamSession.java (line 110) Stream failed because /10.8.44.84 died or was restarted/removed (streams may still be active in background, but further streams won't be started)
ERROR [GossipTasks:1] 2013-10-03 13:09:28,720 AbstractStreamSession.java (line 110) Stream failed because /10.8.44.84 died or was restarted/removed (streams may still be active in background, but further streams won't be started)

Existing streaming node 2 (10.8.44.72):
ERROR [GossipTasks:1] 2013-10-03 13:10:02,174 AbstractStreamSession.java (line 110) Stream failed because /10.8.44.84 died or was restarted/removed (streams may still be active in background, but further streams won't be started)
ERROR [GossipTasks:1] 2013-10-03 13:10:02,185 AbstractStreamSession.java (line 110) Stream failed because /10.8.44.84 died or was restarted/removed (streams may still be active in background, but further streams won't be started)
ERROR [ReplicateOnWriteStage:38] 2013-10-03 13:10:02,265 FailureDetector.java (line 154) unknown endpoint /10.8.44.84
ERROR [ReplicateOnWriteStage:36] 2013-10-03 13:10:02,302 FailureDetector.java (line 154) unknown endpoint /10.8.44.84
ERROR [Native-Transport-Requests:151] 2013-10-03 13:10:02,282 FailureDetector.java (line 154) unknown endpoint /10.8.44.84
ERROR [ReplicateOnWriteStage:37] 2013-10-03 13:10:02,318 FailureDetector.java (line 154) unknown endpoint /10.8.44.84

Bootstrapping node (10.8.44.84):
ERROR [GossipTasks:1] 2013-10-03 13:09:23,196 AbstractStreamSession.java (line 110) Stream failed because /10.8.44.98 died or was restarted/removed (streams may still be active in background, but further streams won't be started)
ERROR [GossipTasks:1] 2013-10-03 13:09:24,199 AbstractStreamSession.java (line 110) Stream failed because /10.8.44.72 died or was restarted/removed (streams may still be active in background, but further streams won't be started)


From: Robert Coli <rcoli@eventbrite.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Wednesday, October 2, 2013 1:55 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Unable to bootstrap new node

On Wed, Oct 2, 2013 at 8:12 AM, Keith Wright <kwright@nanigans.com> wrote:
   We are running C* 1.2.8 with Vnodes enabled and are attempting to bootstrap a new node and are having issues.  When we add the node we see it bootstrap and we see data start to stream over from other nodes however we are seeing one of the other nodes get stuck in full GCs to the point where we had to restart one of the nodes.  I assume this is because building the merkle tree is expensive.

Merkle trees are only involved in "repair", not in normal bootstrap. Have you considered lowering the throttle for streaming? Bootstrap will be slower but should be less likely to overwhelm heap.
 
Any way to force the streaming to restart?   Have others seen this?

In the bootstrap case, you can just wipe the bootstrapping node and re-start the bootstrap.

In the general case regarding hung streaming :


The only solution to hung non-bootstrap streaming is restart all nodes participating in the streaming. With vnodes, this will probably approach 100% of nodes...

=Rob