cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cassandra Wiki] Update of "WritePathForUsers" by MichaelEdge
Date Wed, 02 Dec 2015 05:55:49 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "WritePathForUsers" page has been changed by MichaelEdge:
https://wiki.apache.org/cassandra/WritePathForUsers?action=diff&rev1=29&rev2=30

  = Cassandra Write Path =
  This section provides an overview of the Cassandra Write Path for users of Cassandra. Cassandra
developers, who work on the Cassandra source code, should refer to the [[ArchitectureInternals|Architecture
Internals]] developer documentation for a more detailed overview.
  
- {{attachment:CassandraWritePath.png|text describing image|width=800}}
+ {{attachment:CassandraWritePath.png|Cassandra Write Path|width=800}}
  
  == The Local Coordinator ==
  The local coordinator receives the write request from the client and performs the following:
    1. Firstly, the local coordinator determines which nodes are responsible for storing the
data:
-     * The first replica is chosen based on hashing the primary key using the Partitioner;
the Murmur3Partitioner is the default.
+     * The first replica is chosen based on hashing the primary key using the Partitioner;
Murmur3Partitioner is the default.
-     * Other replicas are chosen based on the replication strategy defined for the keyspace.
In a production cluster this is most likely the NetworkTopologyStrategy.
+ * Other replicas are chosen based on the replication strategy defined for the keyspace.
In a production cluster this is most likely the NetworkTopologyStrategy.
+   1. The local coordinator determines whether the write request would modify an associated
materialized view. 
+ === If write request modifies materialized view ===
+ When using materialized views it’s important to ensure that the base table and materialized
view are consistent, i.e. all changes applied to the base table MUST be applied to the materialized
view. Cassandra uses a two-stage batch log process for this: 
+  * one batch log on the local coordinator ensuring that an update is made on the base table
to a Quorum of replica nodes
+  * one batch log on each replica node ensuring the update is made to the corresponding materialized
view.
+ The process on the local coordinator looks as follows:
+   1. Create batch log. To ensure consistency, the batch log ensures that changes are applied
to a Quorum of replica nodes, regardless of the consistently level of the write request. Acknowledgement
to the client is still based on the write request consistency level.
    1. The write request is then sent to all replica nodes simultaneously.
+ === If write request does not modify materialized view ===
+   1. The write request is then sent to all replica nodes simultaneously.
-   1. The total number of nodes receiving the write request is determined by the replication
factor for the keyspace.
+ In both cases the total number of nodes receiving the write request is determined by the
replication factor for the keyspace.
  
  == Replica Nodes ==
  Replica nodes receive the write request from the local coordinator and perform the following:
@@ -21, +30 @@

   1. If row caching is used, invalidate the cache for that row. Row cache is populated on
read only, so it must be invalidated when data for that row is written.
   1. Acknowledge the write request back to the local coordinator.
  The local coordinator waits for the appropriate number of acknowledgements from the replica
nodes (dependent on the consistency level for this write request) before acknowledging back
to the client.
+ === If write request modifies materialized view ===
+ Keeping a materialized view in sync with its base table adds more complexity to the write
path and also incurs performance overheads on the replica node in the form of read-before-write,
locks and batch logs.
+  1. The replica node acquires a lock on the partition, to ensure that write requests are
serialised and applied to base table and materialized views in order.
+  1. The replica node reads the partition data and constructs the set of deltas to be applied
to the materialized view. One insert/update/delete to the base table may result in many inserts/updates/deletes
to the associated materialized view.
+  1. Write data to the Commit Log. 
+  1. Create batch log containing updates to the materialized view. The batch log ensures
the set of updates to the materialized view is atomic, and is part of the mechanism that ensures
base table and materialized view are kept consistent. 
+  1. Store the batch log containing the materialized view updates on the local replica node.
+  1. Send materialized view updates asynchronously to the materialized view replica (note,
the materialized view could be stored on the same or a different replica node to the base
table).
+  1. Write data to the MemTable.
+  1. The materialized view replica node will apply the update and return an acknowledgement
to the base table replica node.
+  1. The same process takes place on each replica node that stores the data for the partition
key.
+ 
  == Flushing MemTables ==
  MemTables are flushed to disk based on various factors, some of which include:
   * commitlog_total_space_in_mb is exceeded

Mime
View raw message