cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9669) Commit Log Replay is Broken
Date Fri, 17 Jul 2015 10:59:05 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14631186#comment-14631186
] 

Benedict commented on CASSANDRA-9669:
-------------------------------------

So, I am liking this approach less and less. It may be the least effort, but it has too many
sharp edges, in critical portions of the system. It's also literally a custom endeavour for
2.0, 2.1, 2.2 _and_ 3.0.

I think I will introduce a new commit log expiration ledger, and just write to it whenever
we perform a {{discardCompletedSegments()}} call. This is then replayed prior to CL replay,
to build the state of what records we consider replayable. Initially, I will limit this to
a simple statement of "latest replayposition we can be certain to have replayed to" since
this is a uniform behaviour for 2.0+. 2.1+ easily supports ranges, which can be implemented
when we deliver CASSANDRA-8496.

> Commit Log Replay is Broken
> ---------------------------
>
>                 Key: CASSANDRA-9669
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9669
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>            Priority: Critical
>              Labels: correctness
>             Fix For: 3.x, 2.1.x, 2.2.x, 3.0.x
>
>
> While {{postFlushExecutor}} ensures it never expires CL entries out-of-order, on restart
we simply take the maximum replay position of any sstable on disk, and ignore anything prior.

> It is quite possible for there to be two flushes triggered for a given table, and for
the second to finish first by virtue of containing a much smaller quantity of live data (or
perhaps the disk is just under less pressure). If we crash before the first sstable has been
written, then on restart the data it would have represented will disappear, since we will
not replay the CL records.
> This looks to be a bug present since time immemorial, and also seems pretty serious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message