hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2305) Running multiple 2NNs can result in corrupt file system
Date Thu, 01 Sep 2011 01:31:10 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095063#comment-13095063
] 

Aaron T. Myers commented on HDFS-2305:
--------------------------------------

I should've mentioned - this patch is for branch-0.20-security.

> Running multiple 2NNs can result in corrupt file system
> -------------------------------------------------------
>
>                 Key: HDFS-2305
>                 URL: https://issues.apache.org/jira/browse/HDFS-2305
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.20.2
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>         Attachments: hdfs-2305-test.patch
>
>
> Here's the scenario:
> * You run the NN and 2NN (2NN A) on the same machine.
> * You don't have the address of the 2NN configured, so it's defaulting to 127.0.0.1.
> * There's another 2NN (2NN B) running on a second machine.
> * When a 2NN is done checkpointing, it says "hey NN, I have an updated fsimage for you.
You can download it from this URL, which includes my IP address, which is x"
> And here's the steps that occur to cause this issue:
> # Some edits happen.
> # 2NN A (on the NN machine) does a checkpoint. All is dandy.
> # Some more edits happen.
> # 2NN B (on a different machine) does a checkpoint. It tells the NN "grab the newly-merged
fsimage file from 127.0.0.1"
> # NN happily grabs the fsimage from 2NN A (the 2NN on the NN machine), which is stale.
> # NN renames edits.new file to edits. At this point the in-memory FS state is fine, but
the on-disk state is missing edits.
> # The next time a 2NN (any 2NN) tries to do a checkpoint, it gets an up-to-date edits
file, with an outdated fsimage, and tries to apply those edits to that fsimage.
> # Kaboom.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message