hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HDFS-3906) QJM: quorum timeout on failover with large log segment
Date Tue, 11 Sep 2012 06:33:09 GMT

     [ https://issues.apache.org/jira/browse/HDFS-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Todd Lipcon resolved HDFS-3906.

       Resolution: Fixed
    Fix Version/s: QuorumJournalManager (HDFS-3077)
     Hadoop Flags: Reviewed

Committed to branch, thanks
> QJM: quorum timeout on failover with large log segment
> ------------------------------------------------------
>                 Key: HDFS-3906
>                 URL: https://issues.apache.org/jira/browse/HDFS-3906
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: QuorumJournalManager (HDFS-3077)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: QuorumJournalManager (HDFS-3077)
>         Attachments: hdfs-3906.txt
> In doing some stress tests, I ran into an issue with failover if the current edit log
segment written by the old active is large. With a 327MB log segment containing 6.4M transactions,
the JN took ~11 seconds to read and validate it during the recovery step. This was longer
than the 10 second timeout for createNewEpoch, which caused the recovery to fail.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message