cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cassandra Wiki] Update of "ArchitectureInternals" by TylerHobbs
Date Thu, 20 Aug 2015 15:09:36 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "ArchitectureInternals" page has been changed by TylerHobbs:
https://wiki.apache.org/cassandra/ArchitectureInternals?action=diff&rev1=33&rev2=34

Comment:
Fix description of how batchlog nodes are chosen

    * If nodes are changing position on the ring, "pending ranges" are associated with their
destinations in !TokenMetadata and these are also written to.
    * ConsistencyLevel determines how many replies to wait for.  See !WriteResponseHandler.determineBlockFor.
 Interaction with pending ranges is a bit tricky; see https://issues.apache.org/jira/browse/CASSANDRA-833
    * If the FailureDetector says that we don't have enough nodes alive to satisfy the ConsistencyLevel,
we fail the request with !UnavailableException
-   * When performing atomic batches, the mutations are written to the batchlog on the two
closest nodes in the local datacenter that are alive. If only one other node is alive, it
alone will be used, but if no other nodes are alive, an UnavailableException will be returned.
 If the cluster has only one node, it will write the batchlog entry itself.  The batchlog
is contained in the system.batchlog table.
+   * When performing atomic batches, the mutations are written to the batchlog on two live
nodes in the local datacenter. If the local datacenter contains multiple racks, the nodes
will be chosen from two separate racks that are different from the coordinator's rack, when
possible.  If only one other node is alive, it alone will be used, but if no other nodes are
alive, an UnavailableException will be returned unless the consistency level is ANY.  If the
cluster has only one node, it will write the batchlog entry itself.  The batchlog is contained
in the system.batchlog table.
    * If the FD gives us the okay but writes time out anyway because of a failure after the
request is sent or because of an overload scenario, !StorageProxy will write a "hint" locally
to replay the write when the replica(s) timing out recover.  This is called HintedHandoff.
 Note that HH does not prevent inconsistency entirely; either unclean shutdown or hardware
failure can prevent the coordinating node from writing or replaying the hint. ArchitectureAntiEntropy
is responsible for restoring consistency more completely.
    * Cross-datacenter writes are not sent directly to each replica; instead, they are sent
to a single replica with a parameter in !MessageOut telling that replica to forward to the
other replicas in that datacenter; those replicas will respond diectly to the original coordinator.
   * On the destination node, !RowMutationVerbHandler calls RowMutation.apply() (which calls
Keyspace.apply()) to make the mutation.  This has several steps.  First, an entry is appended
to the CommitLog (potentially blocking if the CommitLog is in batch sync mode or if the queue
is full for periodic sync mode.) Next, the Memtable, secondary indexes (if applicable), and
row cache are updated (sequentially) for each ColumnFamily in the mutation.

Mime
View raw message