hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2982) Startup performance suffers when there are many edit log segments
Date Fri, 18 May 2012 06:57:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278619#comment-13278619
] 

Colin Patrick McCabe commented on HDFS-2982:
--------------------------------------------

bq. The javadoc for JournalSet#selectInputStreams is a little over-simplified =) - how about
describing the algorithm (get the streams starting with fromTxid from all managers, return
a list sorted by the starting txid etc)

Ok, will add.

bq. In EditLogFileInputStream#init why only close the stream that threw?

Yeah, I guess closing an already closed stream should be idempotent, at least if they're correctly
implementing the Closable interface.

bq. In TestEditLog readAllEdits is dead code

ok

bq. How about describing the high-level approach in the patch?

>From the high level, this patch is about getting rid of two APIs in JournalManager-- getNumberOfTransactions
and getInputStream, and adding one API to JournalManager-- selectInputStreams.  The new API
simply gathers up all the available streams in one go and puts them into a Collection.  This
is more efficient, and also better for some of the changes we'd like to make in the future,
like supporting overlapping edit log streams.

Edit log validation is the process of finding out how far in-progress edit logs go.  We do
it during edit log finalization so that we can find out what to rename the in-progress edit
log file to.  ("validation" might not be a great name for this process, but it's probably
too late to change it now.)  We don't validate finalized logs.

There are some minor changes to validation here, and a major change.

First, the minor changes.  One change is to have the validation class contain only the end
txid, rather than the start txid, number of txids, and end txid.  The start txid is already
known, and the number of txids does not represent what you might think, but merely end - start
+ 1.  So it's good to get rid of that cruft.  Another minor change is that EditLogValidation#corruptionDetected
was renamed to EditLogValidation#hasCorruptHeader.  That is the concept it always represented--
it never referred to anything other than header corruption, and the rest of the code even
uses the terminology hasCorruptHeader to represent this info (see EditLogFile#hasCorruptHeader).
 So I'm just trying to be consistent.

The major change is that we now read to the end of a corrupt file in validation, finding the
true end transaction rather than merely the first unreadable txid.  This is needed for recovery
to work properly on these files.  It's possible that this change could be dropped from this
patch.  Conceptually, it's more related to HDFS-3049.
                
> Startup performance suffers when there are many edit log segments
> -----------------------------------------------------------------
>
>                 Key: HDFS-2982
>                 URL: https://issues.apache.org/jira/browse/HDFS-2982
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 2.0.0
>            Reporter: Todd Lipcon
>            Assignee: Colin Patrick McCabe
>            Priority: Critical
>         Attachments: HDFS-2982.001.patch
>
>
> For every one of the edit log segments, it seems like we are calling listFiles on the
edit log directory inside of {{findMaxTransaction}}. This is killing performance, especially
when there are many log segments and the directory is stored on NFS. It is taking several
minutes to start up the NN when there are several thousand log segments present.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message