hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-3906) QJM: quorum timeout on failover with large log segment
Date Fri, 07 Sep 2012 21:24:07 GMT
Todd Lipcon created HDFS-3906:

             Summary: QJM: quorum timeout on failover with large log segment
                 Key: HDFS-3906
                 URL: https://issues.apache.org/jira/browse/HDFS-3906
             Project: Hadoop HDFS
          Issue Type: Sub-task
    Affects Versions: QuorumJournalManager (HDFS-3077)
            Reporter: Todd Lipcon
            Assignee: Todd Lipcon
            Priority: Critical

In doing some stress tests, I ran into an issue with failover if the current edit log segment
written by the old active is large. With a 327MB log segment containing 6.4M transactions,
the JN took ~11 seconds to read and validate it during the recovery step. This was longer
than the 10 second timeout for createNewEpoch, which caused the recovery to fail.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message