cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9669) If sstable flushes complete out of order, on restart we can fail to replay necessary commit log records
Date Wed, 29 Jul 2015 10:31:05 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645845#comment-14645845
] 

Benedict commented on CASSANDRA-9669:
-------------------------------------

So, I have a patch available for this [here|https://github.com/belliottsmith/cassandra/tree/9669-2.0]

I managed to make it less invasive than I had anticipated, but it still requires an sstable
version increment. The patch:

* Introduces a commitLogLowerBound to the memtable, which tracks the commit log position at
its creation
* Changes sstable metadata's "replayPosition" into "commitLogLowerBound" and "commitLogUpperBound"
in the new sstable version
* Delays exposing a new sstable to the compaction strategy until all of its preceding flushes
have completed
* On compaction, extends the new sstable's lower/upper bounds to the min/max of all sstables
we're replacing. Given (3), we only extend over ranges that are known to already be covered
by other sstables.
* On replay, we take any range covered by an sstable to not need replay (and any range prior
to the earliest known safe range is also ignored)

Test Engineering: there are failures on dtests, but I cannot tell if these are new or existing.
Mostly the look like flakey tests. The one that looks most worrisome to me is counter upgrade
test, but could you take a look and tell me what you think of the test situation in general?
Modifying 2.0 makes me uncomfortable

> If sstable flushes complete out of order, on restart we can fail to replay necessary
commit log records
> -------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-9669
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9669
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>            Priority: Critical
>              Labels: correctness
>             Fix For: 3.x, 2.1.x, 2.2.x, 3.0.x
>
>
> While {{postFlushExecutor}} ensures it never expires CL entries out-of-order, on restart
we simply take the maximum replay position of any sstable on disk, and ignore anything prior.

> It is quite possible for there to be two flushes triggered for a given table, and for
the second to finish first by virtue of containing a much smaller quantity of live data (or
perhaps the disk is just under less pressure). If we crash before the first sstable has been
written, then on restart the data it would have represented will disappear, since we will
not replay the CL records.
> This looks to be a bug present since time immemorial, and also seems pretty serious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message