hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3736) Failure in starting NN due to fsimage loading failure
Date Mon, 30 Jul 2012 20:13:34 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425179#comment-13425179

Todd Lipcon commented on HDFS-3736:

Hi Suja. Can you also confirm that moving aside (or deleting) the fsimage file results in
a successful startup? Given we retain multiple images, I think it should start up OK after
this manual tweak.

That said, I agree we should be able to fall back to the earlier available image if the newest
one has a missing (or mismatched) md5sum.
> Failure in starting NN due to fsimage loading failure
> -----------------------------------------------------
>                 Key: HDFS-3736
>                 URL: https://issues.apache.org/jira/browse/HDFS-3736
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha, name-node
>            Reporter: suja s
> Came across a situation as follows in our test environment.
> NNs running in HA mode.
> While uploading checkpoint, MD5 file renaming from tmp to actual file failed due to some
reason which is unknown (non IO exception).
> At the same time at standby side, connection imeout occured.
> This lead to tmp MD5 file and original fsimage file (ckpt fsimage file was renamed successfully
to original fsimage file) in the name dir of active NN.
> On NN restart it checks for MD5 file and since it is not found, startup is failing.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message