incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Coli <rc...@eventbrite.com>
Subject Re: problem removing dead node from ring
Date Wed, 04 Jun 2014 01:03:07 GMT
On Tue, Jun 3, 2014 at 3:48 PM, Matthew Allen <matthew.j.allen@gmail.com>
wrote:

> Just out of curiosity, for a dead node, would it be possible to just
>
>  - replace the node (no data in data/commit dirs), same IP Address, same
> hostname.
>  - restore the cassandra.yaml (initial_token etc)
>  - set auto_bootstrap:false
>  - start it up and then run a nodetool rebuild ?
>
> Or would the Host ID value change with the new node ?
>

That would work, but until CASSANDRA-6961 [1] there is no way to prevent
this node from having a long window where it may serve stale reads at CLs
below QUORUM, until the rebuild completes.

"rebuild" gets you exactly one replica's worth of data, just like bootstrap
does. If you want to actually sync a node with all of its replicas and
RF>2, you want "repair" and not "rebuild." I wish "rebuild" had been named
something else, because people seem to think it does something it doesn't
do. This property of decreasing what I call "unique replica count" is why
people like me prefer to back up their nodes with something like tablesnap
[2], so that losing a node does not decrease the "unique replica count." A
simpler solution if you want to avoid the chance of inconsistency is to
operate with CL.QUORUM instead of CL.ONE.

You'd be better off leaving auto_bootstrap set to true and setting
-Dcassandra.replace_address, which bootstraps you (from a single-replica
source per range) to the token owned by the dead node. This is exactly like
your process above, except that you don't serve stale reads while doing so.

That said, the single-replica source thing is why people want to first
bootstrap (which does the same single-replica source thing as "rebuild" but
does not serve writes while it does so) and then repair and then, finally,
join the ring. Note that if writes are incoming, this does not actually
*close* the race window for stale reads at ONE, it just makes it much
shorter.

=Rob
[1] https://issues.apache.org/jira/browse/CASSANDRA-6961
[2] https://github.com/JeremyGrosser/tablesnap

Mime
View raw message