Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 18384182F1 for ; Tue, 15 Sep 2015 13:10:51 +0000 (UTC) Received: (qmail 87910 invoked by uid 500); 15 Sep 2015 13:10:45 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 87871 invoked by uid 500); 15 Sep 2015 13:10:45 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 87811 invoked by uid 99); 15 Sep 2015 13:10:45 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Sep 2015 13:10:45 +0000 Date: Tue, 15 Sep 2015 13:10:45 +0000 (UTC) From: "Benedict (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-9669) If sstable flushes complete out of order, on restart we can fail to replay necessary commit log records MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745432#comment-14745432 ] Benedict commented on CASSANDRA-9669: ------------------------------------- bq. Memtable.isCleanAfter doesn't look right. Good catch, I have inverted the implementation and meaning, and renamed it to {{mayContainDataSince}} bq. "ka" reader says compatible with "kb" data but read will fail Hmm. This is a bit of a problem. I must admit I didn't look too closely at compatibility, since I assumed the whole point of the major/minor chars was to permit intra- and extra-version sstable version increments. Without that, this seems to be a bit of a mess. I guess we will need to increment all of the sstable versions past the max of the current. We should perhaps rethink accepting all versions <= first char of current, as it doesn't permit much flexibility. Thanks. I'll address this, your nits, and what looks like a relatively innocuous problem with DTCS shortly. > If sstable flushes complete out of order, on restart we can fail to replay necessary commit log records > ------------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-9669 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9669 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Benedict > Assignee: Benedict > Priority: Critical > Labels: correctness > Fix For: 3.x, 2.1.x, 2.2.x, 3.0.x > > > While {{postFlushExecutor}} ensures it never expires CL entries out-of-order, on restart we simply take the maximum replay position of any sstable on disk, and ignore anything prior. > It is quite possible for there to be two flushes triggered for a given table, and for the second to finish first by virtue of containing a much smaller quantity of live data (or perhaps the disk is just under less pressure). If we crash before the first sstable has been written, then on restart the data it would have represented will disappear, since we will not replay the CL records. > This looks to be a bug present since time immemorial, and also seems pretty serious. -- This message was sent by Atlassian JIRA (v6.3.4#6332)