hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2305) Running multiple 2NNs can result in corrupt file system
Date Tue, 13 Sep 2011 17:21:09 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103778#comment-13103778

Aaron T. Myers commented on HDFS-2305:

I ran the full test suite on my local box. The only test which failed was {{org.apache.hadoop.hdfs.security.TestDelegationToken}},
which per HADOOP-7625 is known to be failing on branch-0.20-security.

I'll commit this to branch-0.20-security in the next hour or so if there are no objections.

Thanks a lot for the review, Todd.

> Running multiple 2NNs can result in corrupt file system
> -------------------------------------------------------
>                 Key: HDFS-2305
>                 URL: https://issues.apache.org/jira/browse/HDFS-2305
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.20.2
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>         Attachments: hdfs-2305-test.patch, hdfs-2305.0.patch, hdfs-2305.1.patch
> Here's the scenario:
> * You run the NN and 2NN (2NN A) on the same machine.
> * You don't have the address of the 2NN configured, so it's defaulting to
> * There's another 2NN (2NN B) running on a second machine.
> * When a 2NN is done checkpointing, it says "hey NN, I have an updated fsimage for you.
You can download it from this URL, which includes my IP address, which is x"
> And here's the steps that occur to cause this issue:
> # Some edits happen.
> # 2NN A (on the NN machine) does a checkpoint. All is dandy.
> # Some more edits happen.
> # 2NN B (on a different machine) does a checkpoint. It tells the NN "grab the newly-merged
fsimage file from"
> # NN happily grabs the fsimage from 2NN A (the 2NN on the NN machine), which is stale.
> # NN renames edits.new file to edits. At this point the in-memory FS state is fine, but
the on-disk state is missing edits.
> # The next time a 2NN (any 2NN) tries to do a checkpoint, it gets an up-to-date edits
file, with an outdated fsimage, and tries to apply those edits to that fsimage.
> # Kaboom.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message