cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cassandra Wiki] Update of "Operations" by PeterSchuller
Date Mon, 10 Jan 2011 22:20:15 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "Operations" page has been changed by PeterSchuller.
The comment on this change is: Document how to deal with lack of repair within GCGraceSeconds.
http://wiki.apache.org/cassandra/Operations?action=diff&rev1=75&rev2=76

--------------------------------------------------

  === Frequency of nodetool repair ===
  
  Unless your application performs no deletes, it is vital that production clusters run `nodetool
repair` periodically on all nodes in the cluster. The hard requirement for repair frequency
is the value used for GCGraceSeconds (see [[DistributedDeletes]]). Running nodetool repair
often enough to guarantee that all nodes have performed a repair in a given period GCGraceSeconds
long, ensures that deletes are not "forgotten" in the cluster.
+ 
+ ==== Dealing with the consequences of nodetool repair not running within GCGraceSeconds
====
+ 
+ If `nodetool repair` has not been run often enough to the pointthat GCGraceSeconds has passed,
you risk forgotten deletes (see [[DistributedDeletes]]). In addition to data popping up that
has been deleted, you may see inconsistencies in data return from different nodes that will
not self-heal by read-repair or further `nodetool repair`. Some further details on this latter
effect is documented in [[https://issues.apache.org/jira/browse/CASSANDRA-1316|CASSANDRA-1316]].
+ 
+ There are at least three ways to deal with this scenario.
+ 
+ #1 Treat the node in question as failed, and replace it as described further below.
+ #2 To minimize the amount of forgotten deletes, first increase GCGraceSeconds across the
cluster (rolling restart required), perform a full repair on all nodes, and then change GCRaceSeconds
back again. This has the advantage of ensuring tombstones spread as much as possible, minimizing
the amount of data that may "pop back up" (forgotten delete).
+ #3 Yet another option, that will result in more forgotten deletes than the previous suggestion
but is easier to do, is to ensure 'nodetool repair' has been run on all nodes, and then perform
a compaction to expire toombstones. Following this, read-repair and regular `nodetool repair`
should cause the cluster to converge.
  
  === Handling failure ===
  If a node goes down and comes back up, the ordinary repair mechanisms will be adequate to
deal with any inconsistent data.  Remember though that if a node misses updates and is not
repaired for longer than your configured GCGraceSeconds (default: 10 days), it could have
missed remove operations permanently.  Unless your application performs no removes, you should
wipe its data directory, re-bootstrap it, and removetoken its old entry in the ring (see below).

Mime
View raw message