hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-903) NN should verify images and edit logs on startup
Date Thu, 21 Oct 2010 04:36:19 GMT

    [ https://issues.apache.org/jira/browse/HDFS-903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923315#action_12923315
] 

dhruba borthakur commented on HDFS-903:
---------------------------------------

I agree with Konstantin/Hairong that the MD5 signature should be part of the CheckpointSignature.


It would have been nice if the contents of the VERSION file was stored as a header record
in the beginning of the fsimage file itself (I now remember the initial reason why the VERSION
file exists separate from the fsimage: the datanode needs the VERSION file too for its block-directories
and the datanode does not have a fsimage file). Given that, t should be fine to store the
checkum in the VERSION file. Also, the algoritm to compute the checksum need not be configurable,
it could be hardcoded to generate a MD5 checksum.

> NN should verify images and edit logs on startup
> ------------------------------------------------
>
>                 Key: HDFS-903
>                 URL: https://issues.apache.org/jira/browse/HDFS-903
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>            Reporter: Eli Collins
>            Assignee: Hairong Kuang
>            Priority: Critical
>             Fix For: 0.22.0
>
>
> I was playing around with corrupting fsimage and edits logs when there are multiple dfs.name.dirs
specified. I noticed that:
>  * As long as your corruption does not make the image invalid, eg changes an opcode so
it's an invalid opcode HDFS doesn't notice and happily uses a corrupt image or applies the
corrupt edit.
> * If the first image in dfs.name.dir is "valid" it replaces the other copies in the other
name.dirs, even if they are different, with this first image, ie if the first image is actually
invalid/old/corrupt metadata than you've lost your valid metadata, which can result in data
loss if the namenode garbage collects blocks that it thinks are no longer used.
> How about we maintain a checksum as part of the image and edit log and check those on
startup and refuse to startup if they are different. Or at least provide a configuration option
to do so if people are worried about the overhead of maintaining checksums of these files.
Even if we assume dfs.name.dir is reliable storage this guards against operator errors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message