hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-3914) QJM: acceptRecovery should abort current segment
Date Mon, 10 Sep 2012 21:56:09 GMT
Todd Lipcon created HDFS-3914:
---------------------------------

             Summary: QJM: acceptRecovery should abort current segment
                 Key: HDFS-3914
                 URL: https://issues.apache.org/jira/browse/HDFS-3914
             Project: Hadoop HDFS
          Issue Type: Sub-task
    Affects Versions: QuorumJournalManager (HDFS-3077)
            Reporter: Todd Lipcon
            Assignee: Todd Lipcon


Found this bug with randomized testing. The following sequence causes a problem:

- JN writing segment starting at txid 1, and successfully wrote txid 1, but no more
- JN becomes partitioned from NN, and a new NN takes over
- new NN is also partitioned for the "prepareRecovery" phase of recovery, but properly connects
for the "acceptRecovery" call
- acceptRecovery copies over a longer log segment (eg txns 1-3) from a good logger
- new NN calls finalizeLogSegment(), but gets the following error: JournalOutOfSyncException:
Trying to finalize in-progress log segment 1 to end at txid 3 but only written up to txid
1

This is because the "syncLog" call (which copies the new segment) isn't properly aborting
the old segment before replacing it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message