hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-955) FSImage.saveFSImage can lose edits
Date Tue, 09 Feb 2010 05:29:28 GMT

    [ https://issues.apache.org/jira/browse/HDFS-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831300#action_12831300
] 

Todd Lipcon commented on HDFS-955:
----------------------------------

I've verified the behavior from HDFS-909 on 0.20 (though I'm pretty certain it also exists
on trunk).

To reproduce, I did a little manual "fault injection" - I added {code}if (new File("/tmp/savefsimage.die").exists())
System.exit(1);{code} after saving IMAGE_NEW in saveFSImage. I then did the following sequence:

- start NN
- hadoop fs -mkdir test1
- hadoop dfsadmin -safemode enter
- touch /tmp/savefsimage.die
- hadoop dfsadmin -saveNamespace
- (NN "crashes")

This leaves dfs.name.dir/current as:
{noformat}
-rw-r--r-- 1 todd todd   4 2010-02-08 21:24 edits
-rw-r--r-- 1 todd todd   4 2010-02-08 21:24 edits.new
-rw-r--r-- 1 todd todd  94 2010-02-08 21:24 fsimage
-rw-r--r-- 1 todd todd 323 2010-02-08 21:24 fsimage.ckpt
-rw-r--r-- 1 todd todd   8 2010-02-08 21:24 fstime
-rw-r--r-- 1 todd todd 100 2010-02-08 21:24 VERSION
{noformat}

(fsimage.ckpt has the proper image including my directory)

If I now remove the fault injection file and start the NN, it "recovers" to:
{noformat}
-rw-r--r-- 1 todd todd   4 2010-02-08 21:25 edits
-rw-r--r-- 1 todd todd  94 2010-02-08 21:25 fsimage
-rw-r--r-- 1 todd todd   8 2010-02-08 21:25 fstime
-rw-r--r-- 1 todd todd 100 2010-02-08 21:25 VERSION
{noformat}
(ie all edits since last successful checkpoint were lost)

> FSImage.saveFSImage can lose edits
> ----------------------------------
>
>                 Key: HDFS-955
>                 URL: https://issues.apache.org/jira/browse/HDFS-955
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.21.0, 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Blocker
>
> This is a continuation of a discussion from HDFS-909. The FSImage.saveFSImage function
(implementing dfsadmin -saveNamespace) can corrupt the NN storage such that all current edits
are lost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message