Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 1D815200D4A for ; Mon, 13 Nov 2017 23:08:06 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 1C3C0160BF0; Mon, 13 Nov 2017 22:08:06 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6E788160BF3 for ; Mon, 13 Nov 2017 23:08:05 +0100 (CET) Received: (qmail 9493 invoked by uid 500); 13 Nov 2017 22:08:04 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 9472 invoked by uid 99); 13 Nov 2017 22:08:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Nov 2017 22:08:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 685D9184369 for ; Mon, 13 Nov 2017 22:08:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id HTmcXikbRI3D for ; Mon, 13 Nov 2017 22:08:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 9DE665F2C3 for ; Mon, 13 Nov 2017 22:08:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 24D00E259A for ; Mon, 13 Nov 2017 22:08:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 9916D24106 for ; Mon, 13 Nov 2017 22:08:00 +0000 (UTC) Date: Mon, 13 Nov 2017 22:08:00 +0000 (UTC) From: "sankalp kohli (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-13987) Multithreaded commitlog subtly changed durability MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 13 Nov 2017 22:08:06 -0000 [ https://issues.apache.org/jira/browse/CASSANDRA-13987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16250338#comment-16250338 ] sankalp kohli commented on CASSANDRA-13987: ------------------------------------------- +1 for doing this in 3.0+ > Multithreaded commitlog subtly changed durability > ------------------------------------------------- > > Key: CASSANDRA-13987 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13987 > Project: Cassandra > Issue Type: Improvement > Reporter: Jason Brown > Assignee: Jason Brown > Fix For: 4.x > > > When multithreaded commitlog was introduced in CASSANDRA-3578, we subtly changed the way that commitlog durability worked. Everything still gets written to an mmap file. However, not everything is replayable from the mmaped file after a process crash, in periodic mode. > In brief, the reason this changesd is due to the chained markers that are required for the multithreaded commit log. At each msync, we wait for outstanding mutations to serialize into the commitlog, and update a marker before and after the commits that have accumluated since the last sync. With those markers, we can safely replay that section of the commitlog. Without the markers, we have no guarantee that the commits in that section were successfully written, thus we abandon those commits on replay. > If you have correlated process failures of multiple nodes at "nearly" the same time (see ["There Is No Now"|http://queue.acm.org/detail.cfm?id=2745385]), it is possible to have data loss if none of the nodes msync the commitlog. For example, with RF=3, if quorum write succeeds on two nodes (and we acknowledge the write back to the client), and then the process on both nodes OOMs (say, due to reading the index for a 100GB partition), the write will be lost if neither process msync'ed the commitlog. More exactly, the commitlog cannot be fully replayed. The reason why this data is silently lost is due to the chained markers that were introduced with CASSANDRA-3578. > The problem we are addressing with this ticket is incrementally improving 'durability' due to process crash, not host crash. (Note: operators should use batch mode to ensure greater durability, but batch mode in it's current implementation is a) borked, and b) will burn through, *very* rapidly, SSDs that don't have a non-volatile write cache sitting in front.) > The current default for {{commitlog_sync_period_in_ms}} is 10 seconds, which means that a node could lose up to ten seconds of data due to process crash. The unfortunate thing is that the data is still avaialble, in the mmap file, but we can't replay it due to incomplete chained markers. > ftr, I don't believe we've ever had a stated policy about commitlog durability wrt process crash. Pre-2.0 we naturally piggy-backed off the memory mapped file and the fact that every mutation was acquired a lock and wrote into the mmap buffer, and the ability to replay everything out of it came for free. With CASSANDRA-3578, that was subtly changed. > Something [~jjirsa] pointed out to me is that [MySQL provides a way to adjust the durability guarantees|https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit] of each commit in innodb via the {{innodb_flush_log_at_trx_commit}}. I'm using that idea as a loose springboard for what to do here. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org For additional commands, e-mail: commits-help@cassandra.apache.org