Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id BD482200B13 for ; Tue, 31 May 2016 10:47:14 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id BBEAB160A09; Tue, 31 May 2016 08:47:14 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 0F414160A23 for ; Tue, 31 May 2016 10:47:13 +0200 (CEST) Received: (qmail 24579 invoked by uid 500); 31 May 2016 08:47:13 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 24471 invoked by uid 99); 31 May 2016 08:47:13 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 May 2016 08:47:13 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id ED8872C1F61 for ; Tue, 31 May 2016 08:47:12 +0000 (UTC) Date: Tue, 31 May 2016 08:47:12 +0000 (UTC) From: "Sam Tunnicliffe (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CASSANDRA-9669) If sstable flushes complete out of order, on restart we can fail to replay necessary commit log records MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 31 May 2016 08:47:14 -0000 [ https://issues.apache.org/jira/browse/CASSANDRA-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-9669: --------------------------------------- Resolution: Fixed Reproduced In: 2.1.7, 2.0.16, 1.2.19 (was: 1.2.19, 2.0.16, 2.1.7) Status: Resolved (was: Patch Available) ok, committed Branimir's patch to 2.2 in {{66c8f2b7f79fe794cc1e0594d9add260c209a9a2}} and mine to 3.0 in {{81ffc4601952ff3a9fec8493cd27fe52544ea115}}. > If sstable flushes complete out of order, on restart we can fail to replay necessary commit log records > ------------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-9669 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9669 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths > Reporter: Benedict > Assignee: Branimir Lambov > Priority: Critical > Labels: correctness > Fix For: 2.2.7, 3.7, 3.0.7 > > > While {{postFlushExecutor}} ensures it never expires CL entries out-of-order, on restart we simply take the maximum replay position of any sstable on disk, and ignore anything prior. > It is quite possible for there to be two flushes triggered for a given table, and for the second to finish first by virtue of containing a much smaller quantity of live data (or perhaps the disk is just under less pressure). If we crash before the first sstable has been written, then on restart the data it would have represented will disappear, since we will not replay the CL records. > This looks to be a bug present since time immemorial, and also seems pretty serious. -- This message was sent by Atlassian JIRA (v6.3.4#6332)