incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From E S <>
Subject Safely adding new nodes without losing data
Date Sat, 20 Jul 2013 14:30:28 GMT
I am trying to understand the best procedure for adding new nodes.  The one that I see most
often online seems to have a hole where there is a low probability of permanently losing data.
 I want to understand what I am missing in my understanding.

Let's say I have a 3 node cluster (node A,B,C) with a RF of 3.  I want to double the cluster
size to 6 (node A,B,C,D,E,F) while keeping the replication factor of 3.  Let's assume we
use vnodes.

My understanding is to bootstrap the 3 nodes and then run repair then cleanup.  Here is my
failure case:

Before bootstrapping I have a row that is only replicated onto node A and B.  Assume I did
a quorum write and there was some hiccup on C, hinted handoff didn't work, and a repair has
not yet been run.  Let's also assume that once nodes D,E, F have been bootstrapped, this
rows new replicas are D,E, and F.

My reading through the bootstrapping code shows that for a given range, it streams it only
from one node (unlike repair).  There is a 1/9 chance that D,E,F will have streamed the range
containing the row from C, which does not have this row.

Now not even a consistency level read of ALL will return the row.  A repair will not solve
it, and when cleanup is run, the row is permanently deleted.

I don't think this problem would normally happen without vnodes, because when doubling you
would alternate the new nodes with the old nodes in the ring, so while quorum might not work
until the final repair, "all" would, and a repair would solve the problem.  With vnodes though,
some of the ranges will follow the pattern above (range ownership moving from A,B,C to D,E,F).

Am I missing something here?  If I'm right, I think the only way to avoid this is adding
less then a quorum of new nodes (in this case 1) before doing a repair.  That would be painful
since repairs take a while.

Thanks for any help.

View raw message