cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cassandra Wiki] Update of "Operations" by JonathanEllis
Date Mon, 04 Jan 2010 01:29:15 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "Operations" page has been changed by JonathanEllis.
The comment on this change is: explain repair/bootstrap options more clearly.
http://wiki.apache.org/cassandra/Operations?action=diff&rev1=17&rev2=18

--------------------------------------------------

   1. Anti-Entropy: when `nodeprobe repair` is run, Cassandra performs a major compaction,
computes a Merkle Tree of the data on that node, and compares it with the versions on other
replicas, to catch any out of sync data that hasn't been read recently.  This is intended
to be run infrequently (e.g., weekly) since major compaction is relatively expensive.
  
  === Handling failure ===
- If a node goes down and comes back up, the ordinary repair mechanisms will be adequate to
deal with any inconsistent data.  If a node goes down entirely, you should be aware of the
following as well:
+ If a node goes down and comes back up, the ordinary repair mechanisms will be adequate to
deal with any inconsistent data.  If a node goes down entirely, then you have two options:
  
-  1. Remove the old node from the ring first, or bring up a replacement node with the same
IP and Token as the old; otherwise, the old node will stay part of the ring in a "down" state,
which will degrade your replication factor for the affected Range
+  1. Bring up a replacement node with the same IP and Token as the old, and run `nodeprobe
repair`.  Until the repair process is complete, clients reading only from this node may get
no data back.  Using a higher !ConsistencyLevel on reads will avoid this.
    * If you don't know the Token of the old node, you can retrieve it from any of the other
nodes' `system` keyspace, !ColumnFamily `LocationInfo`, key `L`.
    * You can also run  `nodeprobe ring `to lookup a node's token (Unless there was some kind
of outage, and the others came up but not the down one).
-  1. Removing the old node, then bootstrapping the new one, may be more performant than using
Anti-Entropy.  Testing needed.
-   * Even brute-force rsyncing of data from the relevant replicas and running cleanup on
the replacement node may be more performant
+  1. Remove the old token ring entry with `nodeprobe removetoken`
+   * optionally, bootstrap a new node at either the old node's location (using the InitialToken
configuration directive) or at an automatically determined one.  Since a bootstrapping node
does not advertise itself as available for reads until it has all the data for its ranges
transferred, this avoids the problem of clients reading at !ConsistencyLevel.ONE seeing empty
replies.  This may also be more performant than using the `nodeprobe repair` approach; testing
needed.
+ 
+ Do not leave the old node permanently in the token ring as "Down;" when it is in this state
the cluster thinks it may eventually come back up with its old data, and will not re-replicate
the data it was responsible for elsewhere.
  
  == Backing up data ==
  Cassandra can snapshot data while online using `nodeprobe snapshot`.  You can then back
up those snapshots using any desired system, although leaving them where they are is probably
the option that makes the most sense on large clusters.

Mime
View raw message