incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Naryshkin" <konstant...@a-bb.net>
Subject Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'
Date Thu, 30 Jun 2011 20:47:48 GMT
As I understand, it has to do with a node being up but missing the delete message (remember,
if you apply the delete at CL.QUORUM, you can have almost half the replicas miss it and still
succeed). Imagine that you have 3 nodes A, B, and C, each of which has a column 'foo' with
a value 'bar'. Their state would be:
A: 'foo':'bar'     B: 'foo':'bar'     C: 'foo':'bar'

We attempt to delete column 'foo', and it succeeds on nodes A and B (meaning that we succeeded
on CL.QUORUM). Unfortunately the packet going to node C runs afoul of the network gods and
gets zapped in transit. The state is now:
A: 'foo':deleted     B: 'foo':deleted     C: 'foo':'bar'

If we try a read at this point, at CL.QUORUM, we are guaranteed to get at least one record
that 'foo' was deleted and because of timestamps we know to tell the client as much.

After GCGraceSeconds and a compaction, the state of the nodes will be:
A: None     B: None     C: 'foo':'bar'

Some time later, we attempt a read and just happen to get C's response first. The response
will be that 'foo' is storing 'bar'. Not only that, but read repair happens as well, so the
state will become:
A: 'foo':'bar'     B: 'foo':'bar'     C: 'foo':'bar'

We have the infamous undelete.

----- Original Message -----
From: "A J" <s5alye@gmail.com>
To: user@cassandra.apache.org
Sent: Thursday, June 30, 2011 8:25:29 PM
Subject: Meaning of 'nodetool repair has to run within GCGraceSeconds'

I am little confused of the reason why nodetool repair has to run
within GCGraceSeconds.

The documentation at:
http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair
is not very clear to me.

How can a delete be 'unforgotten' if I don't run nodetool repair? (I
understand that if a node is down for more than GCGraceSeconds, I
should not get it up without resynching is completely. Otherwise
deletes may reappear.http://wiki.apache.org/cassandra/DistributedDeletes
)
But not sure how exactly nodetool repair ties into this mechanism of
distributed deletes.

Thanks for any clarifications.

Mime
View raw message