hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Collins (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (HDFS-1800) Extend image checksumming to function with multiple fsimage files
Date Sat, 07 May 2011 02:11:03 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030244#comment-13030244
] 

Eli Collins edited comment on HDFS-1800 at 5/7/11 2:10 AM:
-----------------------------------------------------------

Approach #1 is fine with me. For #3 it's nice that the file is self-verifying and it's more
consistent with the edits log checksumming (admins already need a special tool to verify the
edits log). However it seems like the code will be simpler if the sum is stored separately
(as it already is) and it is nice an admin can use {{md5sum}} on the raw image files.

Updated patch looks good.  Comments/questions:
* Why is the md5 file format "hash\tfile" instead of the format ("hash *file") produced by
md5sum for binary files?
* MD5FilesUtil should live somewhere like o.a.h.h.util
* Per your question in NNStorage#attemptRestoreRemovedStorage I don't think we should automatically
blow away the contents of a failed storage directory.  
* Does the version checking removed in NNStorage#getFields get moved somewhere else?
* In openEditLog why do we unconditionally write the current txid?
* doMerge needs more params in the javadoc
* Will the TODOs, 0xDEADBEEF sums etc be resolved in this jira or another one?
* What case does the TODO in formatOccurred refer to? The given storage directory and fsimage
are out of sync?
* Is there a test that covers a directory with multiple checkpoints/checksum files?

      was (Author: eli):
    Approach #1 is fine with me. For #3 it's nice that the file is self-verifying and it's
more consistent with the edits log checksumming (admins already need a special tool to verify
the edits log). However it seems like the code will be simpler if the sum is stored separately
(as it already is) and it is nice an admin can use {{md5sum}} on the raw image files.

Updated patch looks good.  Comments/questions:
* Why is the md5 file format "hash\tfile" instead of the format ("hash *file") produced by
md5sum for binary files?
* MD5FilesUtil should live somewhere like o.a.h.h.util
* Per your question in NNStorage#attemptRestoreRemovedStorage I don't think we should automatically
blow away the contents of a failed storage directory.  
* Does the version checking removed in NNStorage#getFields get moved somewhere else?
* In openEditLog why do we unconditionally write the current txid?
* Will getLoadedImageMd5 ever return null (causing us to skip the check) in case eg where

* doMerge needs more params in the javadoc
* Will the TODOs, 0xDEADBEEF sums etc be resolved in this jira or another one?
* What case does the TODO in formatOccurred refer to? The given storage directory and fsimage
are out of sync?
* Is there a test that covers a directory with multiple checkpoints/checksum files?
  
> Extend image checksumming to function with multiple fsimage files
> -----------------------------------------------------------------
>
>                 Key: HDFS-1800
>                 URL: https://issues.apache.org/jira/browse/HDFS-1800
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: name-node
>    Affects Versions: Edit log branch (HDFS-1073)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: Edit log branch (HDFS-1073)
>
>         Attachments: hdfs-1800-prelim.txt, hdfs-1800.txt, hdfs-1800.txt
>
>
> HDFS-903 added the MD5 checksum of the fsimage to the VERSION file in each image directory.
This allows it to verify that the FSImage didn't get corrupted or accidentally replaced on
disk.
> With HDFS-1073, there may be multiple fsimage_N files in a storage directory corresponding
to different checkpoints. So having a single MD5 in the VERSION file won't suffice. Instead
we need to store an MD5 per image file.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message