hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3049) During the normal loading NN startup process, fall back on a different EditLog if we see one that is corrupt
Date Thu, 26 Apr 2012 02:01:16 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13262316#comment-13262316
] 

Todd Lipcon commented on HDFS-3049:
-----------------------------------

- can you please add some comments to the code explaining the behavior of getInputStream?
eg perhaps a comment above the definition of "best" saying "the set of streams which all contain
the same number of transactions, starting from this point".
- I think some of the new INFO logs are better off being debug
- Am I correct in understanding that MergedEditLogInputStream only provides failover functionality
between multiple logs which contain the same exact set of transactions? In that case, I think
a better name is something like FailoverEditLogInputStream. Also, I think you should iterate
through the streams in the constructor and verify that they all have matching metadata (version,
first tx, last tx, length), throwing an IllegalArgumentException if not.
- You can use {{MultipleIOException}} to group all the exception causes together
- The new class needs an interface audience annotation (Private) - or can it be made package-private?
- If you have an error on the very first transaction read, then {{needSkip}} gets set true
but {{prevTxId}} is still {{INVALID_TXID}} right? Maybe instead initialize {{prevTxId}} to
{{firstTxId-1}}? Or make the variable be {{nextTxId}}?
- Can you add some comments to nextOp()? I am having trouble following the control flow --
why don't you just do the "skipping" in the error handling path instead of using this "needSkip"
flag?

- In the exceptions you throw, you should include the name of the underlying stream
- the debug message "now trying to read from" should probably be combined with the "got error"
message above.
- Is it really worth adding a new parameter to MiniDFSCluster for this, instead of just manually
setting the name/edit dir configs for the few test cases where we need to ensure only a single
dir?

- Refactor out the code to corrupt a log file, since it now appears multiple times, and will
make the test easier to digest
- No need to catch the exception and fail() in the test case - if you just let the exception
fall through, it will fail on its own

{code}
+        public boolean accept(File dir, String name) {
+          if (name.startsWith(NNStorage.getFinalizedEditsFileName(startErrorTxId, 
+                                  endErrorTxId))) {
+            return true;
+          }
+          return false;
+        }
{code}

why does this have to be so complicated? How would we ever have more than one file starting
with edits_N-M in the directory? i.e. why not just do: {{{File editLog = new File(new File(f1,
"current"), NNStorage.getFinalized...)))}}

                
> During the normal loading NN startup process, fall back on a different EditLog if we
see one that is corrupt
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-3049
>                 URL: https://issues.apache.org/jira/browse/HDFS-3049
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: name-node
>    Affects Versions: 0.23.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Minor
>         Attachments: HDFS-3049.001.patch, HDFS-3049.002.patch, HDFS-3049.003.patch
>
>
> During the NameNode startup process, we load an image, and then apply edit logs to it
until we believe that we have all the latest changes.  Unfortunately, if there is an I/O error
while reading any of these files, in most cases, we simply abort the startup process.  We
should try harder to locate a readable edit log and/or image file.
> *There are three main use cases for this feature:*
> 1. If the operating system does not honor fsync (usually due to a misconfiguration),
a file may end up in an inconsistent state.
> 2. In certain older releases where we did not use fallocate() or similar to pre-reserve
blocks, a disk full condition may cause a truncated log in one edit directory.
> 3. There may be a bug in HDFS which results in some of the data directories receiving
corrupt data, but not all.  This is the least likely use case.
> *Proposed changes to normal NN startup*
> * We should try a different FSImage if we can't load the first one we try.
> * We should examine other FSEditLogs if we can't load the first one(s) we try.
> * We should fail if we can't find EditLogs that would bring us up to what we believe
is the latest transaction ID.
> Proposed changes to recovery mode NN startup:
> we should list out all the available storage directories and allow the operator to select
which one he wants to use.
> Something like this:
> {code}
> Multiple storage directories found.
> 1. /foo/bar
>     edits__curent__XYZ          size:213421345       md5:2345345
>     image                                  size:213421345       md5:2345345
> 2. /foo/baz
>     edits__curent__XYZ          size:213421345       md5:2345345345
>     image                                  size:213421345       md5:2345345
> Which one would you like to use? (1/2)
> {code}
> As usual in recovery mode, we want to be flexible about error handling.  In this case,
this means that we should NOT fail if we can't find EditLogs that would bring us up to what
we believe is the latest transaction ID.
> *Not addressed by this feature*
> This feature will not address the case where an attempt to access the NameNode name directory
or directories hangs because of an I/O error.  This may happen, for example, when trying to
load an image from a hard-mounted NFS directory, when the NFS server has gone away.  Just
as now, the operator will have to notice this problem and take steps to correct it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message