hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Collins (JIRA)" <j...@apache.org>
Subject [jira] Created: (HDFS-903) NN should verify images and edit logs on startup
Date Fri, 15 Jan 2010 23:28:54 GMT
NN should verify images and edit logs on startup

                 Key: HDFS-903
                 URL: https://issues.apache.org/jira/browse/HDFS-903
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: name-node
            Reporter: Eli Collins
            Assignee: Eli Collins
            Priority: Critical

I was playing around with corrupting fsimage and edits logs when there are multiple dfs.name.dirs
specified. I noticed that:
 * As long as your corruption does not make the image invalid, eg changes an opcode so it's
an invalid opcode HDFS doesn't notice and happily uses a corrupt image or applies the corrupt
* If the first image in dfs.name.dir is "valid" it replaces the other copies in the other
name.dirs, even if they are different, with this first image, ie if the first image is actually
invalid/old/corrupt metadata than you've lost your valid metadata, which can result in data
loss if the namenode garbage collects blocks that it thinks are no longer used.

How about we maintain a checksum as part of the image and edit log and check those on startup
and refuse to startup if they are different. Or at least provide a configuration option to
do so if people are worried about the overhead of maintaining checksums of these files. Even
if we assume dfs.name.dir is reliable storage this guards against operator errors.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message