I was successfully able to bootstrap the node. The issue was RF > 2. Thanks again Robert.


On Wed, Oct 30, 2013 at 10:29 AM, Narendra Sharma <narendra.sharma@gmail.com> wrote:
Thanks Robert.

I didn't realize that some of the keyspaces (not all and esp. the biggest one I was focusing on) had RF > 2. I wasted 3 days on it. Thanks again for the pointers. I will try again and share the results.


On Wed, Oct 30, 2013 at 12:28 AM, Robert Coli <rcoli@eventbrite.com> wrote:
On Tue, Oct 29, 2013 at 11:45 AM, Narendra Sharma <narendra.sharma@gmail.com> wrote:
We had a cluster of 4 nodes in AWS. The average load on each node was approx 750GB. We added 4 new nodes. It is now more than 30 hours and the node is still in JOINING mode.
Specifically I am analyzing the one with IP 10.3.1.29. There is no compaction or streaming or index building happening. 

If your cluster has RF>2, you are bootstrapping two nodes into the same range simultaneously. That is not supported. [1,2] The node you are having the problem with is in the range that is probably overlapping.

If I were you I would :

1) stop all "Joining" nodes and wipe their state including system keyspace
2) optionally "removetoken" any nodes which remain in cluster gossip state after stopping
3) re-start/bootstrap them one at a time, waiting for each to complete bootstrapping before starting the next  one
4) (unrelated) Upgrade from 1.1.6 to the head of 1.1.x ASAP.

=Rob



--
Narendra Sharma



--
Narendra Sharma
Software Engineer
http://www.aeris.com
http://narendrasharma.blogspot.com/