hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3906) QJM: quorum timeout on failover with large log segment
Date Tue, 11 Sep 2012 05:08:07 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13452719#comment-13452719

Aaron T. Myers commented on HDFS-3906:

+1, the patch looks good to me, and I agree that the refactor makes things a little clearer.
> QJM: quorum timeout on failover with large log segment
> ------------------------------------------------------
>                 Key: HDFS-3906
>                 URL: https://issues.apache.org/jira/browse/HDFS-3906
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: QuorumJournalManager (HDFS-3077)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>         Attachments: hdfs-3906.txt
> In doing some stress tests, I ran into an issue with failover if the current edit log
segment written by the old active is large. With a 327MB log segment containing 6.4M transactions,
the JN took ~11 seconds to read and validate it during the recovery step. This was longer
than the 10 second timeout for createNewEpoch, which caused the recovery to fail.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message