hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2982) Startup performance suffers when there are many edit log segments
Date Mon, 21 May 2012 18:19:41 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280332#comment-13280332

Colin Patrick McCabe commented on HDFS-2982:

I renamed the resync parameter to skipBrokenEdits, since that's what it is in Reader#readOp,
and this function just passes it on to there.  That is a pretty concise description of what
it does.

The changes to TestNameNodeRecovery are for correctness.  Formerly, we were doing multiple
mkdirs operations on the same directory.  This resulted in only one mkdir operation getting
added to the stream.  Then when we corrupted the last edit, the mkdir operation was lost--
a bad thing, since we check for it later.

I'm not 100% sure if calling cluster.waitActive() in this test is necessary, since we have
0 DataNodes.  However, we do it everywhere else, and consistency is a good thing.  Also, conceptually
what we want is for the NameNode to come up and be active.  It seems more robust to check
for that directly rather than assuming that no part of edit log loading happens in the background.
> Startup performance suffers when there are many edit log segments
> -----------------------------------------------------------------
>                 Key: HDFS-2982
>                 URL: https://issues.apache.org/jira/browse/HDFS-2982
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 2.0.0
>            Reporter: Todd Lipcon
>            Assignee: Colin Patrick McCabe
>            Priority: Critical
>         Attachments: HDFS-2982.001.patch, HDFS-2982.002.patch, HDFS-2982.003.patch, HDFS-2982.004.patch,
HDFS-2982.005.patch, HDFS-2982.006.patch
> For every one of the edit log segments, it seems like we are calling listFiles on the
edit log directory inside of {{findMaxTransaction}}. This is killing performance, especially
when there are many log segments and the directory is stored on NFS. It is taking several
minutes to start up the NN when there are several thousand log segments present.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message