hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2982) Startup performance suffers when there are many edit log segments
Date Fri, 18 May 2012 06:34:09 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278610#comment-13278610

Colin Patrick McCabe commented on HDFS-2982:

There are lots and lots of unit tests would have to change if EditLogInputStream started requiring
an init() call.  Not to mention the subtle bugs that might crop up.  That alone would almost
be worth its own patch.  Let's deal with this later if we decide it's something worth doing.
 Frankly, I would argue against it because I think there's better APIs we could design.  In
particular, an API which separates the concept of a stream from the concept of a stream location
is much more efficient and results in cleaner code, because the invariant that you can't use
something without initializing it is then enforced by the type system.  So basically, can
we revisit this idea later, as in after this week?

bq. The new test case is missing the @Test annotation so it won't run.

Will fix.

bq. Are the changes to validateEditLog necessary here? And the change to how corrupt files
are handled?

It's often really time consuming to change these things because then I have to redo all the
unit tests.  Still, I will take a look at it.
> Startup performance suffers when there are many edit log segments
> -----------------------------------------------------------------
>                 Key: HDFS-2982
>                 URL: https://issues.apache.org/jira/browse/HDFS-2982
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 2.0.0
>            Reporter: Todd Lipcon
>            Assignee: Colin Patrick McCabe
>            Priority: Critical
>         Attachments: HDFS-2982.001.patch
> For every one of the edit log segments, it seems like we are calling listFiles on the
edit log directory inside of {{findMaxTransaction}}. This is killing performance, especially
when there are many log segments and the directory is stored on NFS. It is taking several
minutes to start up the NN when there are several thousand log segments present.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message