cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Branimir Lambov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9195) commitlog replay only actually replays mutation every other time
Date Thu, 23 Apr 2015 16:55:39 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14509353#comment-14509353
] 

Branimir Lambov commented on CASSANDRA-9195:
--------------------------------------------

bq. Then if you go and replay those commitlogs up to some time before the truncation C* should
recognize that the replay is strictly before any truncation took place and let things replay.

There is a big problem with this: you now have a table with a lot of data that is before the
table's known truncation time. This violates assumptions, may cause all sorts of trouble,
and is generally a state that should not be tolerated.

Thus a truncation record reset has to be part of the process. And if it is present, it doesn't
make any sense to have it anywhere else than at the start, which in turn allows the commitlog
to operate correctly. To my limited understanding the restore was started by the sstableloader
call, hence I put the flag there. As now I see this is not always the case, perhaps a standalone
reset-truncation-record utility is required?

> commitlog replay only actually replays mutation every other time
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-9195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9195
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jon Moses
>            Assignee: Branimir Lambov
>            Priority: Critical
>             Fix For: 2.1.5
>
>         Attachments: 9195-v2.1.patch, loader.py
>
>
> Version: Cassandra 2.1.4.374 | DSE 4.7.0
> The main issue here is that the restore-cycle only replays the mutations
> every other try.  On the first try, it will restore the snapshot as expected
> and the cassandra system load will show that it's reading the mutations, but
> they do not actually get replayed, and at the end you're left with only the
> snapshot data (2k records).
> If you re-run the restore-cycle again, the commitlogs are replayed as expected,
> and the data expected is present in the table (4k records, with a spot check of 
> record 4500, as it's in the commitlog but not the snapshot).
> Then if you run the cycle again, it will fail.  Then again, and it will work. The work/
> not work pattern continues.  Even re-running the commitlog replay a 2nd time, without
> reloading the snapshot doesn't work
> The load process is:
> * Modify commitlog segment to 1mb
> * Archive to directory
> * create keyspace/table
> * insert base data
> * initial snapshot
> * write more data
> * capture timestamp
> * write more data
> * final snapshot
> * copy commitlogs to 2nd location
> * modify cassandra-env to replay only specified keyspace
> * modify commitlog properties to restore from 2nd location, with noted timestamp
> The restore cycle is:
> * truncate table
> * sstableload snapshot
> * flush
> * output data status
> * restart to replay commitlogs
> * output data status
> ====
> See attached .py for a mostly automated reproduction scenario.  It expects DSE (and I
found it with DSE 4.7.0-1), rather than "actual" Cassandra, but it's not using any DSE specific
features.  The script looks for the configs in the DSE locations, but they're set at the top,
and there's only 2 places where dse is restarted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message