cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Cassandra Wiki] Update of "WritePathForUsers" by MichaelEdge
Date Mon, 30 Nov 2015 06:40:03 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "WritePathForUsers" page has been changed by MichaelEdge:

  {{attachment:CassandraWritePath.png|text describing image|width=700}}
+ Write Path
+ The Local Coordinator
+ The local coordinator receives the write request from the client and performs the following:
+ 1.	The local coordinator determines which nodes are responsible for storing the data:
+ •	The first replica is chosen based on the Partitioner hashing the primary key
+ •	Other replicas are chosen based on replication strategy defined for the keyspace
+ 2.	The write request is then sent to all replica nodes simultaneously.
+ 3.	The total number of nodes receiving the write request is determined by the replication
factor for the keyspace.
+ Replica Nodes
+ Replica nodes receive the write request from the local coordinator and perform the following:
+ 1.	Write data to the Commit Log. This is a sequential, memory-mapped log file, on disk,
that can be used to rebuild MemTables if a crash occurs before the MemTable is flushed to
+ 2.	Write data to the MemTable. MemTables are mutable, in-memory tables that are read/write.
Each physical table on each replica node has an associated MemTable.
+ 3.	If the write request is a DELETE operation (whether a delete of a column or a row), a
tombstone marker is written to the Commit Log and MemTable to indicate the delete.
+ 4.	If row caching is used, invalidate the cache for that row. Row cache is populated on
read only, so it must be invalidated when data for that row is written.
+ 5.	Acknowledge the write request back to the local coordinator.
+ The local coordinator waits for the appropriate number of acknowledgements (dependent on
the consistency level for this write request) before acknowledging back to the client.
+ Flushing MemTables
+ MemTables are flushed to disk based on various factors, some of which include:
+ •	commitlog_total_space_in_mb is exceeded
+ •	memtable_total_space_in_mb is exceeded
+ •	‘Nodetool flush’ command is executed
+ •	Etc.
+ Each flush of a MemTable results in one new, immutable SSTable on disk. After the flush
an SSTable (Sorted String Table) is read-only. As with the write to the Commit Log, the write
to the SSTable data file is a sequential write operation. An SSTable consists of multiple
files, including the following:
+ •	Bloom Filter
+ •	Index
+ •	Compression File (optional)
+ •	Statistics File
+ •	Data File
+ •	Summary
+ •	TOC.txt
+ Each MemTable flush executes the following steps:
+ 1.	Sort the MemTable columns by row key
+ 2.	Write the Bloom Filter
+ 3.	Write the Index
+ 4.	Serialise and write the data to the SSTable Data File
+ 5.	Write Compression File (if compression is used)
+ 6.	Write Statistics File
+ 7.	Purge the written data from the Commit Log
+ Unavailable Replica Nodes and Hinted Handoff
+ When a local coordinator is unable to send data to a replica node due to the replica node
being unavailable, the local coordinator stores the data in its local system.hints table;
this process is known as Hinted Handoff. The data is stored for a default period of 3 hours.
When the replica node comes back online the coordinator node will send the data to the replica
+ Write Path Advantages
+ •	The write path is one of Cassandra’s key strengths: for each write request one sequential
disk write plus one in-memory write occur, both of which are extremely fast.
+ •	During a write operation, Cassandra never reads before writing, never rewrites data,
never deletes data and never performs random I/O.
+ ---- /!\ '''End of edit conflict''' ----

View raw message