hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Foley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-979) FSImage should specify which dirs are missing when refusing to come up
Date Thu, 30 Jun 2011 20:31:28 GMT

    [ https://issues.apache.org/jira/browse/HDFS-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058059#comment-13058059

Matt Foley commented on HDFS-979:

Sorry for the delay, I was at the Hadoop Summit.  Thanks for waiting.
In general, more specific logging is good.  I wanted to check that this enhancement does not
overlap with the reporting of bad dataDirs and editsDirs from e.g. the callers of  NNStorage#reportErrorsOnDirectories().
 I now believe there is no collision, so go for it!  Here are some additional comments on
this patch:


* Please use dataDirs.isEmpty() rather than dataDirs.size() == 0; same for editsDirs.  But
it might be best to also check that they are non-null before checking for emptiness.
* The enveloping check for "if (startOpt != StartupOption.IMPORT) {...}" concerns me, since
the old code did not have this check.  It's confusing because this method is called from both
loadFSImage() and doImportCheckpoint(), and in either case the dataDirs and editsDirs should
be non-null and non-empty.  Here's the question:  When IMPORT is set, and loadFSImage calls
recoverTransitionRead for the FIRST time (before the recursive call from doImportCheckpoint),
isn't it important that dataDirs and editsDirs be checked for non-emptiness?  I'm not sure,
because FSNamesystem#getStorageDirs does have a comment that says "//When importing image
from a checkpoint, the name-node can start with empty set of storage directories."  But it
also logs a warning that this is a really bad idea, since no edits log could be recorded.
 Can you explain why the old code (without this check) was considered okay?
* Looking further at FSNamesystem#getStorageDirs, when IMPORT is not set, it seems you'll
never get an empty directory list.  It does
if (dirNames.isEmpty())

* Comments have mis-spellings: editsDirs (not editDirs) and necessarily.
* Suggest that the dirType strings should be "fsimage" and "edits" rather than "data" and
"edit", in call from recoverTransitionRead().  Better yet, since this really should be an
enum, why not use the NameNodeDirType values IMAGE and EDITS?
* There's no need to pass configKey and directoriesToCheck args.  Since this is a single-use
method, keep the signature clean by fetching them within the getEmptyRecoveryIOExceptionMsg()
method, based on the dirType arg.
* In-line the value of "confValue".
* The test for dirsCollection.size() == 0 isn't necessary.  If dirsCollection isn't empty,
this method won't be called, right?  But if for some reason it gets called when dirsCollection
isn't empty, the log string would be suddenly terminated at "Specifically: ".  It's better
to just proceed with constructing the log string.
* In fact, why bother sending dirsCollection as an arg at all?  It will always be empty. 
The dirType has all the info needed.
* If dir.exists() print "EXISTS BUT IS APPARENTLY INACCESSIBLE", rather than just "EXISTS".
* Instead of limiting the method to URIs with scheme == "file", can you use the FileContext
stuff to test .exists() on the directories regardless of scheme?

> FSImage should specify which dirs are missing when refusing to come up
> ----------------------------------------------------------------------
>                 Key: HDFS-979
>                 URL: https://issues.apache.org/jira/browse/HDFS-979
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.22.0
>            Reporter: Steve Loughran
>            Assignee: Jim Plush
>            Priority: Minor
>             Fix For: 0.23.0
>         Attachments: HDFS-979-take1.txt, HDFS-979-take2.txt, HDFS-979-take3.txt
> When {{FSImage}} can't come up as either it has no data or edit dirs, it tells me this
> {code}
> java.io.IOException: All specified directories are not accessible or do not exist.
> {code}
> What it doesn't do is say which of the two attributes are missing. This would be beneficial
to anyone trying to track down the problem. Also, I don't think the message is correct. It's
bailing out because dataDirs.size() == 0 || editsDirs.size() == 0 , because a list is empty
-not because the dirs aren't there, as there hasn't been any validation yet.
> More useful would be
> # Explicit mention of which attributes are null
> # Declare that this is because they are not in the config

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message