hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-3914) QJM: acceptRecovery should abort current segment
Date Mon, 10 Sep 2012 21:56:09 GMT
Todd Lipcon created HDFS-3914:

             Summary: QJM: acceptRecovery should abort current segment
                 Key: HDFS-3914
                 URL: https://issues.apache.org/jira/browse/HDFS-3914
             Project: Hadoop HDFS
          Issue Type: Sub-task
    Affects Versions: QuorumJournalManager (HDFS-3077)
            Reporter: Todd Lipcon
            Assignee: Todd Lipcon

Found this bug with randomized testing. The following sequence causes a problem:

- JN writing segment starting at txid 1, and successfully wrote txid 1, but no more
- JN becomes partitioned from NN, and a new NN takes over
- new NN is also partitioned for the "prepareRecovery" phase of recovery, but properly connects
for the "acceptRecovery" call
- acceptRecovery copies over a longer log segment (eg txns 1-3) from a good logger
- new NN calls finalizeLogSegment(), but gets the following error: JournalOutOfSyncException:
Trying to finalize in-progress log segment 1 to end at txid 3 but only written up to txid

This is because the "syncLog" call (which copies the new segment) isn't properly aborting
the old segment before replacing it.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message