hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3652) 1.x: FSEditLog failure removes the wrong edit stream when storage dirs have same name
Date Thu, 12 Jul 2012 23:37:34 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413326#comment-13413326

Todd Lipcon commented on HDFS-3652:

This has data-loss implications as well. I am able to reproduce the following:

- NN is writing to three dirs: /data/1/nn, /data/2/nn, and /data/3/nn
- I modified the NN to inject an IOException when creating "edits.new" in /data/3/nn, which
causes "removeEditsForStorageDir" to get called inside {{rollEditLog}}
- Upon triggering a checkpoint:
-- all three logs are closed successfully
-- /data/1/nn and /data/2/nn are successfully opened for "edits.new"
-- /data/3/nn throws an IOE which gets caught. This calls {{removeEditsForStorageDir}}, which
removes the wrong stream (augmented logging):
12/07/12 16:23:54 INFO namenode.FSNamesystem: Roll Edit Log from
12/07/12 16:23:54 INFO namenode.FSNamesystem: Number of transactions: 0 Total time for transactions(ms):
0Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0 0 0 
12/07/12 16:23:54 WARN namenode.FSNamesystem: Removing edits stream /tmp/name1/nn/current/edits.new
12/07/12 16:23:54 WARN common.Storage: Removing storage dir /tmp/name3/nn
java.io.IOException: Injected fault for /tmp/name3/nn/current/edits.new
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog$EditLogFileOutputStream.<init>(FSEditLog.java:146)
- The NN is now _only_ writing to /tmp/name2/nn/current/edits.new, but considers both name1
and name2 to be good from a storage-directory standpoint. However, {{/tmp/name1/nn/current/edits.new}}
exists as an empty edit log file (just the header and preallocated region of 0xffs)
- When {{rollFSImage}} is called, it successfully calls {{close}} only on the name2 log -
which truncates it to the correct transaction boundary. Then it renames both {{name2/.../edits.new}}
and {{name1/.../edits.new}} to {{edits}}, and opens them both for append (assuming they've
been truncated to a transaction boundary).
- The NN is now writing to name1 and name2, but name1's log looks like this:

<valid header> <preallocated bytes of 0xffffffffffff.....> <transactions>

- Upon the next checkpoint, the 2NN will likely download this log, since it's listed first
in the name directory list. Upon doing so, it will see the 0xff at the head of the log and
not read any of the edits (which come after all of the 0xffs)
- The 2NN then uploads the "merged" image back to the NN, which blows away the "edits" file.
Thus, its in-memory data has gotten out of sync with the disk data, and the next time a checkpoint
occurs or the NN restarts, it will fail.

This is not an issue in trunk since the code was largely rewritten by HDFS-1073.

The workaround for existing users is simple: rename the directories to eg /data/1/nn1 and
/data/2/nn2. The fix is also simple. I will upload the fix this afternoon.
> 1.x: FSEditLog failure removes the wrong edit stream when storage dirs have same name
> -------------------------------------------------------------------------------------
>                 Key: HDFS-3652
>                 URL: https://issues.apache.org/jira/browse/HDFS-3652
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 1.0.3, 1.1.0, 1.2.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Blocker
> In {{FSEditLog.removeEditsForStorageDir}}, we iterate over the edits streams trying to
find the stream corresponding to a given dir. To check equality, we currently use the following
> {code}
>       File parentDir = getStorageDirForStream(idx);
>       if (parentDir.getName().equals(sd.getRoot().getName())) {
> {code}
> ... which is horribly incorrect. If two or more storage dirs happen to have the same
terminal path component (eg /data/1/nn and /data/2/nn) then it will pick the wrong stream(s)
to remove.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message