Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 724C3180D3 for ; Mon, 30 Nov 2015 06:40:14 +0000 (UTC) Received: (qmail 43593 invoked by uid 500); 30 Nov 2015 06:40:14 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 43557 invoked by uid 500); 30 Nov 2015 06:40:14 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 43546 invoked by uid 99); 30 Nov 2015 06:40:14 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Nov 2015 06:40:14 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id B288EC06C9 for ; Mon, 30 Nov 2015 06:40:13 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.446 X-Spam-Level: X-Spam-Status: No, score=0.446 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-0.554] autolearn=disabled Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id LPgfpTriAZjj for ; Mon, 30 Nov 2015 06:40:04 +0000 (UTC) Received: from eos.apache.org (eos.apache.org [140.211.11.131]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTP id DD2FF42B19 for ; Mon, 30 Nov 2015 06:40:03 +0000 (UTC) Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id 2CCBC98 for ; Mon, 30 Nov 2015 06:40:03 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Apache Wiki To: Apache Wiki Date: Mon, 30 Nov 2015 06:40:03 -0000 Message-ID: <20151130064003.8435.55820@eos.apache.org> Subject: =?utf-8?q?=5BCassandra_Wiki=5D_Update_of_=22WritePathForUsers=22_by_Micha?= =?utf-8?q?elEdge?= Auto-Submitted: auto-generated Dear Wiki user, You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for= change notification. The "WritePathForUsers" page has been changed by MichaelEdge: https://wiki.apache.org/cassandra/WritePathForUsers?action=3Ddiff&rev1=3D10= &rev2=3D11 = {{attachment:CassandraWritePath.png|text describing image|width=3D700}} = + Write Path + The Local Coordinator + The local coordinator receives the write request from the client and perf= orms the following: + 1. The local coordinator determines which nodes are responsible for stori= ng the data: + =E2=80=A2 The first replica is chosen based on the Partitioner hashing th= e primary key + =E2=80=A2 Other replicas are chosen based on replication strategy defined= for the keyspace + 2. The write request is then sent to all replica nodes simultaneously. + 3. The total number of nodes receiving the write request is determined by= the replication factor for the keyspace. + Replica Nodes + Replica nodes receive the write request from the local coordinator and pe= rform the following: + 1. Write data to the Commit Log. This is a sequential, memory-mapped log = file, on disk, that can be used to rebuild MemTables if a crash occurs befo= re the MemTable is flushed to disk. + 2. Write data to the MemTable. MemTables are mutable, in-memory tables th= at are read/write. Each physical table on each replica node has an associat= ed MemTable. + 3. If the write request is a DELETE operation (whether a delete of a colu= mn or a row), a tombstone marker is written to the Commit Log and MemTable = to indicate the delete. + 4. If row caching is used, invalidate the cache for that row. Row cache i= s populated on read only, so it must be invalidated when data for that row = is written. + 5. Acknowledge the write request back to the local coordinator. + The local coordinator waits for the appropriate number of acknowledgement= s (dependent on the consistency level for this write request) before acknow= ledging back to the client. + Flushing MemTables + MemTables are flushed to disk based on various factors, some of which inc= lude: + =E2=80=A2 commitlog_total_space_in_mb is exceeded + =E2=80=A2 memtable_total_space_in_mb is exceeded + =E2=80=A2 =E2=80=98Nodetool flush=E2=80=99 command is executed + =E2=80=A2 Etc. + Each flush of a MemTable results in one new, immutable SSTable on disk. A= fter the flush an SSTable (Sorted String Table) is read-only. As with the w= rite to the Commit Log, the write to the SSTable data file is a sequential = write operation. An SSTable consists of multiple files, including the follo= wing: + =E2=80=A2 Bloom Filter + =E2=80=A2 Index + =E2=80=A2 Compression File (optional) + =E2=80=A2 Statistics File + =E2=80=A2 Data File + =E2=80=A2 Summary + =E2=80=A2 TOC.txt + Each MemTable flush executes the following steps: + 1. Sort the MemTable columns by row key + 2. Write the Bloom Filter + 3. Write the Index + 4. Serialise and write the data to the SSTable Data File + 5. Write Compression File (if compression is used) + 6. Write Statistics File + 7. Purge the written data from the Commit Log + Unavailable Replica Nodes and Hinted Handoff + When a local coordinator is unable to send data to a replica node due to = the replica node being unavailable, the local coordinator stores the data i= n its local system.hints table; this process is known as Hinted Handoff. Th= e data is stored for a default period of 3 hours. When the replica node com= es back online the coordinator node will send the data to the replica node. + Write Path Advantages + =E2=80=A2 The write path is one of Cassandra=E2=80=99s key strengths: for= each write request one sequential disk write plus one in-memory write occu= r, both of which are extremely fast. + =E2=80=A2 During a write operation, Cassandra never reads before writing,= never rewrites data, never deletes data and never performs random I/O. + = + ---- /!\ '''End of edit conflict''' ---- +=20