cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cassandra Wiki] Update of "Operations" by scott white
Date Tue, 09 Feb 2010 00:25:05 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "Operations" page has been changed by scott white.
http://wiki.apache.org/cassandra/Operations?action=diff&rev1=28&rev2=29

--------------------------------------------------

  The following applies to Cassandra 0.5.
  
  == Hardware ==
- 
- See [[CassandraHardware]]
+ See CassandraHardware
  
  == Tuning ==
- 
- See [[PerformanceTuning]]
+ See PerformanceTuning
  
  == Ring management ==
  Each Cassandra server [node] is assigned a unique Token that determines what keys it is
the primary replica for.  If you sort all nodes' Tokens, the Range of keys each is responsible
for is (!PreviousToken, !MyToken], that is, from the previous token (exclusive) to the node's
token (inclusive).  The machine with the lowest Token gets both all keys less than that token,
and all keys greater than the largest Token; this is called a "wrapping Range."
@@ -31, +29 @@

   * !RackAwareStrategy: replica 2 is is placed in the first node along the ring the belongs
in '''another''' data center than the first; the remaining N-2 replicas, if any, are placed
on the first nodes along the ring in the '''same''' rack as the first
  
  Note that with !RackAwareStrategy, succeeding nodes along the ring should alternate data
centers to avoid hot spots.  For instance, if you have nodes A, B, C, and D in increasing
Token order, and instead of alternating you place A and B in DC1, and C and D in DC2, then
nodes C and A will have disproportionately more data on them because they will be the replica
destination for every Token range in the other data center.
+ 
   * The corollary to this is, if you want to start with a single DC and add another later,
when you add the second DC you should add as many nodes as you have in the first rather than
adding a node or two at a time gradually.
  
  Replication strategy is not intended to be changed once live, but if you are sufficiently
motivated it can be done with some manual effort:
+ 
   1. anticompact each node's primary Range, yielding sstables containing only that Range
data
   1. copy those sstables to the nodes responsible for extra replicas under the new strategy
   1. change the strategy and restart
@@ -41, +41 @@

  Replication factor is not really intended to be changed in a live cluster either, but increasing
it may be done if you (a) use ConsistencyLevel.QUORUM or ALL (depending on your existing replication
factor) to make sure that a replica that actually has the data is consulted, (b) are willing
to accept downtime while anti-entropy repair runs (see below), or (c) are willing to live
with some clients potentially being told no data exists if they read from the new replica
location(s) until repair is done.
  
  Reducing replication factor is easily done and only requires running cleanup afterwards
to remove extra replicas.
-  
+ 
  === Network topology ===
- 
  Besides datacenters, you can also tell Cassandra which nodes are in the same rack within
a datacenter.  Cassandra will use this to route both reads and data movement for Range changes
to the nearest replicas.  This is configured by a user-pluggable !EndpointSnitch class in
the configuration file.
  
  !EndpointSnitch is related to, but distinct from, replication strategy itself: !RackAwareStrategy
needs a properly configured Snitch to places replicas correctly, but even absent a Strategy
that cares about datacenters, the rest of Cassandra will still be location-sensitive.
@@ -51, +50 @@

  There is an example of a custom Snitch implementation in https://svn.apache.org/repos/asf/incubator/cassandra/trunk/contrib/property_snitch/.
  
  == Range changes ==
- 
  === Bootstrap ===
  Adding new nodes is called "bootstrapping."
  
@@ -62, +60 @@

  Important things to note:
  
   1. You should wait long enough for all the nodes in your cluster to become aware of the
bootstrapping node via gossip before starting another bootstrap.  For most clusters 30s will
be plenty of time.
-  1. Automatically picking a Token only allows doubling your cluster size at once; for more
than that, let the first group finish before starting another.
+  1. Relating to point 1, one can only boostrap N nodes at a time with automatic token picking,
where N is the size of the existing cluster. If you need to more than double the size of your
cluster, you have to wait for the first N nodes to finish until your cluster is size 2N before
bootstrapping more nodes. So if your current cluster is 5 nodes and you want add 7 nodes,
bootstrap 5 and let those finish before boostrapping the last two.
   1. As a safety measure, Cassandra does not automatically remove data from nodes that "lose"
part of their Token Range to a newly added node.  Run "nodetool cleanup" on the source node(s)
when you are satisfied the new node is up and working. If you do not do this the old data
will still be counted against the load on that node and future bootstrap attempts at choosing
a location will be thrown off.
  
  Cassandra is smart enough to transfer data from the nearest source node(s), if your !EndpointSnitch
is configured correctly.  So, the new node doesn't need to be in the same datacenter as the
primary replica for the Range it is bootstrapping into, as long as another replica is in the
datacenter with the new one.
@@ -93, +91 @@

  If a node goes down entirely, then you have two options:
  
   1. (Recommended approach) Bring up the replacement node with a new IP address, and !AutoBootstrap
set to true in storage-conf.xml. This will place the replacement node in the cluster and find
the appropriate position automatically. Then the bootstrap process begins. While this process
runs, the node will not receive reads until finished. Once this process is finished on the
replacement node, run `nodetool removetoken` once, suppling the token of the dead node, and
`nodetool cleanup` on each node.
-  * You can obtain the dead node's token by running `nodetool ring` on any live node, unless
there was some kind of outage, and the others came up but not the down one -- in that case,
you can retrieve the token from the live nodes' system tables.
+  1. You can obtain the dead node's token by running `nodetool ring` on any live node, unless
there was some kind of outage, and the others came up but not the down one -- in that case,
you can retrieve the token from the live nodes' system tables.
  
-  1. (Alternative approach) Bring up a replacement node with the same IP and token as the
old, and run `nodetool repair`. Until the repair process is complete, clients reading only
from this node may get no data back.  Using a higher !ConsistencyLevel on reads will avoid
this. 
+  1. (Alternative approach) Bring up a replacement node with the same IP and token as the
old, and run `nodetool repair`. Until the repair process is complete, clients reading only
from this node may get no data back.  Using a higher !ConsistencyLevel on reads will avoid
this.
  
  The reason why you run `nodetool cleanup` on all live nodes is to remove old Hinted Handoff
writes stored for the dead node.
  
@@ -112, +110 @@

  {{{
  Usage: sstable2json [-f outfile] <sstable> [-k key [-k key [...]]]
  }}}
- 
  `bin/sstable2json` accepts as a required argument, the full path to an SSTable data file,
(files ending in -Data.db), and an optional argument for an output file (by default, output
is written to stdout). You can also pass the names of specific keys using the `-k` argument
to limit what is exported.
  
  Note: If you are not running the exporter on in-place SSTables, there are a couple of things
to keep in mind.
+ 
   1. The corresponding configuration must be present (same as it would be to run a node).
-  2. SSTables are expected to be in a directory named for the keyspace (same as they would
be on a production node).
+  1. SSTables are expected to be in a directory named for the keyspace (same as they would
be on a production node).
  
  JSON exported SSTables can be "imported" to create new SSTables using `bin/json2sstable`:
  
  {{{
  Usage: json2sstable -K keyspace -c column_family <json> <sstable>
  }}}
- 
  `bin/json2sstable` takes arguments for keyspace and column family names, and full paths
for the JSON input file and the destination SSTable file name.
  
  You can also import pre-serialized rows of data using the BinaryMemtable interface.  This
is useful for importing via Hadoop or another source where you want to do some preprocessing
of the data to import.
@@ -136, +133 @@

  
  Important metrics to watch on a per-Column Family basis would be: '''Read Count, Read Latency,
Write Count and Write Latency'''. '''Pending Tasks''' tell you if things are backing up. These
metrics can also be exposed using any JMX client such as `jconsole`
  
- You can also use jconsole, and the MBeans tab to look at PendingTasks for thread pools.
If you see one particular thread backing up, this can give you an indication of a problem.
One example would be ROW-MUTATION-STAGE indicating that write requests are arriving faster
than they can be handled. A more subtle example is the FLUSH stages: if these start backing
up, cassandra is accepting writes into memory fast enough, but the sort-and-write-to-disk
stages are falling behind. 
+ You can also use jconsole, and the MBeans tab to look at PendingTasks for thread pools.
If you see one particular thread backing up, this can give you an indication of a problem.
One example would be ROW-MUTATION-STAGE indicating that write requests are arriving faster
than they can be handled. A more subtle example is the FLUSH stages: if these start backing
up, cassandra is accepting writes into memory fast enough, but the sort-and-write-to-disk
stages are falling behind.
  
  If you are seeing a lot of tasks being built up, your hardware or configuration tuning is
probably the bottleneck.
  

Mime
View raw message