hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3049) During the normal loading NN startup process, fall back on a different EditLog if we see one that is corrupt
Date Tue, 15 May 2012 20:15:07 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276177#comment-13276177

Todd Lipcon commented on HDFS-3049:

- You're still missing the license and annotations on RedundantEditLogInputStream.java

+    // and can't be pre-transactional.
+    for (EditLogInputStream s : streams) {
+      Preconditions.checkArgument(s.getFirstTxId() !=
+          HdfsConstants.INVALID_TXID);
+      Preconditions.checkArgument(s.getLastTxId() !=
+          HdfsConstants.INVALID_TXID);
+    }
Can you add a format string argument to these checks, so that if they fail, it will print
s as a string? i.e {{checkArgument(..., "bad stream: %s", s);}}

+    /* We sort the streams here so that the streams that end later come first.
+     */
Style (// for inline comments, see above)

+        LOG.error("Got error reading edit log input stream " +
+          streams[curIdx].getName(), prevException);

Will it have already logged the offset of the error? Or will the exception itself contain
the offset? Otherwise we should include it in the error message.

- getPosition() in the merged stream now returns th eposition of the underlying stream, which
increases as we read one file and then resets back to zero. But, in FSEditLog, we track these
offsets for error reporting purposes. We need to make sure that, if there is an unrecoverable
corruption, the log messages specifically identify the path and offset of the corruption.
I'm not sure that's the case, now that we have the extra abstraction here. Can you try using
a single storage dir and corrupting the logs somewhere in a middle segment?

- Can you add a note to the javadoc for the redundant stream that it doesn't handle the "ping
pong" scenario? ie that if a segment has an error, we will discard that segment and then move
to the next one?

- Regarding memory usage: I'm afraid that each stream opened will end up maintaining a large
buffer, since it's generally wrapped with BufferedInputStream, and we use mark(100MB). Maybe
we should close each stream as soon as we finish with it, rather than waiting until the close()
call at the end. Have you tested loading a large edit log composed of many segments? eg a
total 1GB log, made of 10 100MB segments, on a NN with say 1G heap?
> During the normal loading NN startup process, fall back on a different EditLog if we
see one that is corrupt
> ------------------------------------------------------------------------------------------------------------
>                 Key: HDFS-3049
>                 URL: https://issues.apache.org/jira/browse/HDFS-3049
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: name-node
>    Affects Versions: 0.23.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Minor
>         Attachments: HDFS-3049.001.patch, HDFS-3049.002.patch, HDFS-3049.003.patch, HDFS-3049.005.against3335.patch,
HDFS-3049.006.against3335.patch, HDFS-3049.007.against3335.patch, HDFS-3049.010.patch, HDFS-3049.011.patch
> During the NameNode startup process, we load an image, and then apply edit logs to it
until we believe that we have all the latest changes.  Unfortunately, if there is an I/O error
while reading any of these files, in most cases, we simply abort the startup process.  We
should try harder to locate a readable edit log and/or image file.
> *There are three main use cases for this feature:*
> 1. If the operating system does not honor fsync (usually due to a misconfiguration),
a file may end up in an inconsistent state.
> 2. In certain older releases where we did not use fallocate() or similar to pre-reserve
blocks, a disk full condition may cause a truncated log in one edit directory.
> 3. There may be a bug in HDFS which results in some of the data directories receiving
corrupt data, but not all.  This is the least likely use case.
> *Proposed changes to normal NN startup*
> * We should try a different FSImage if we can't load the first one we try.
> * We should examine other FSEditLogs if we can't load the first one(s) we try.
> * We should fail if we can't find EditLogs that would bring us up to what we believe
is the latest transaction ID.
> Proposed changes to recovery mode NN startup:
> we should list out all the available storage directories and allow the operator to select
which one he wants to use.
> Something like this:
> {code}
> Multiple storage directories found.
> 1. /foo/bar
>     edits__curent__XYZ          size:213421345       md5:2345345
>     image                                  size:213421345       md5:2345345
> 2. /foo/baz
>     edits__curent__XYZ          size:213421345       md5:2345345345
>     image                                  size:213421345       md5:2345345
> Which one would you like to use? (1/2)
> {code}
> As usual in recovery mode, we want to be flexible about error handling.  In this case,
this means that we should NOT fail if we can't find EditLogs that would bring us up to what
we believe is the latest transaction ID.
> *Not addressed by this feature*
> This feature will not address the case where an attempt to access the NameNode name directory
or directories hangs because of an I/O error.  This may happen, for example, when trying to
load an image from a hard-mounted NFS directory, when the NFS server has gone away.  Just
as now, the operator will have to notice this problem and take steps to correct it.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message