hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2692) HA: Bugs related to failover from/into safe-mode
Date Tue, 27 Dec 2011 06:00:34 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176094#comment-13176094

Aaron T. Myers commented on HDFS-2692:

The patch largely looks great, Todd. Thanks a lot for figuring this out and writing such excellent
tests. I also manually verified that the test scenario I described to you, and confirmed that
it now works as expected with the patch applied.

The following comments are mostly nits. Feel free to address all but the first 2 in a separate

# In {{FSEditLogLoader#loadFSEdits}}, should we really be unconditionally calling {{FSNamesystem#notifyGenStampUpdate}}
in the {{finally}} block? What if an error occurs and {{maxGenStamp}} is never updated in
{{FSEditLogLoader#loadEditRecords}} ?
# sp. "Initiatling" in TestHASafeMode#testComplexFailoverIntoSafemode
# In {{FSNamesystem#notifyGenStampUpdate}}, could be a better log message, and the log level
should probably not be info: {{LOG.info("=> notified of genstamp update for: " + gs)}};
# Why is {{SafeModeInfo#doConsistencyCheck}} costly? It doesn't seem like it should be. If
it's not in fact expensive, we might as well make it run regardless of whether or not asserts
are enabled.
# Is there really no better way to check if assertions are enabled?
# Rather than increase the visibility of {{TestDNFencing#(waitForTrueReplication,getTrueReplication,waitForDNDeletions,waitForNNToIssueDeletions)}}
for use in {{TestHASafeMode}}, seems like they should all be made member methods and moved
to {{MiniDFSCluster}}.
# Also seems like {{TestEditLogTailer#waitForStandbyToCatchUp}} should be moved to {{MiniDFSCluster}}.
> HA: Bugs related to failover from/into safe-mode
> ------------------------------------------------
>                 Key: HDFS-2692
>                 URL: https://issues.apache.org/jira/browse/HDFS-2692
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha, name-node
>    Affects Versions: HA branch (HDFS-1623)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>         Attachments: hdfs-2692.txt, hdfs-2692.txt
> In testing I saw an AssertionError come up several times when I was trying to do failover
between two NNs where one or the other was in safe-mode. Need to write some unit tests to
try to trigger this -- hunch is it has something to do with the treatment of "safe block count"
while tailing edits in safemode.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message