incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Allen <matthew.j.al...@gmail.com>
Subject Re: problem removing dead node from ring
Date Wed, 04 Jun 2014 04:03:55 GMT
| That would work, but until CASSANDRA-6961 [1] there is no way to prevent
this node from having a long window where it may serve stale
| reads at CLs below QUORUM, until the rebuild completes.

Thanks Robert, this makes perfect sense.  Do you know if CASSANDRA-6961
will be ported to 1.2.x ?

And apologies if these appear to be dumb questions, but is a repair more
suitable than a rebuild because the rebuild only contacts 1 replica (per
range), which may itself contain stale data ?

Thanks

Matt




On Wed, Jun 4, 2014 at 11:03 AM, Robert Coli <rcoli@eventbrite.com> wrote:

> On Tue, Jun 3, 2014 at 3:48 PM, Matthew Allen <matthew.j.allen@gmail.com>
> wrote:
>
>> Just out of curiosity, for a dead node, would it be possible to just
>>
>>  - replace the node (no data in data/commit dirs), same IP Address, same
>> hostname.
>>  - restore the cassandra.yaml (initial_token etc)
>>  - set auto_bootstrap:false
>>  - start it up and then run a nodetool rebuild ?
>>
>> Or would the Host ID value change with the new node ?
>>
>
> That would work, but until CASSANDRA-6961 [1] there is no way to prevent
> this node from having a long window where it may serve stale reads at CLs
> below QUORUM, until the rebuild completes.
>
> "rebuild" gets you exactly one replica's worth of data, just like
> bootstrap does. If you want to actually sync a node with all of its
> replicas and RF>2, you want "repair" and not "rebuild." I wish "rebuild"
> had been named something else, because people seem to think it does
> something it doesn't do. This property of decreasing what I call "unique
> replica count" is why people like me prefer to back up their nodes with
> something like tablesnap [2], so that losing a node does not decrease the
> "unique replica count." A simpler solution if you want to avoid the chance
> of inconsistency is to operate with CL.QUORUM instead of CL.ONE.
>
> You'd be better off leaving auto_bootstrap set to true and setting
> -Dcassandra.replace_address, which bootstraps you (from a single-replica
> source per range) to the token owned by the dead node. This is exactly like
> your process above, except that you don't serve stale reads while doing so.
>
> That said, the single-replica source thing is why people want to first
> bootstrap (which does the same single-replica source thing as "rebuild" but
> does not serve writes while it does so) and then repair and then, finally,
> join the ring. Note that if writes are incoming, this does not actually
> *close* the race window for stale reads at ONE, it just makes it much
> shorter.
>
> =Rob
> [1] https://issues.apache.org/jira/browse/CASSANDRA-6961
> [2] https://github.com/JeremyGrosser/tablesnap
>

Mime
View raw message