hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravi Prakash (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2848) hdfs corruption appended to blocks is not detected by fs commands or fsck
Date Wed, 01 Feb 2012 21:38:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13198176#comment-13198176

Ravi Prakash commented on HDFS-2848:

Ouch! This JIRA is turning out to be more of a pickle than I expected. First, to answer my
own question (Thanks Bobby). No. The DN doesn't verify the checksum before serving up the
data. That responsibility falls to the client which is receiving the data. Presumably this
was done so that checksum verification is not done twice. 

Now here's what I discovered.
1. Each datanode has a ReplicasMap. On startup, the DN loads the metadata about each block
in this map.
2. Every subsequent request for that metadata (e.g. request for the length of the block) is
fulfilled from this in-memory map. If the block file has been changed (e.g. appended to),
the ReplicasMap has no knowledge of the fact.

So when a client requests a block, the DN serves data until the uncorrupted length. Its only
after a DN is restarted that it serves up the corrupted data, which the Client then notices.

I talked with Kihwal, and he told me we should figure out what happens in an append. Does
the corrupt data get over-written? What happens when the block is corrupted, the DN is restarted,
and then data is appended to the block?
> hdfs corruption appended to blocks is not detected by fs commands or fsck
> -------------------------------------------------------------------------
>                 Key: HDFS-2848
>                 URL: https://issues.apache.org/jira/browse/HDFS-2848
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.23.0
>            Reporter: Ravi Prakash
>            Assignee: Ravi Prakash
> Courtesy Pat White [~patwhitey2007]
> {quote}
> Appears that there is a regression in corrupt block detection by both fsck and fs cmds
like 'cat'. Testcases for
> pre-block and block-overwrite corruption of all replicas is correctly reporting errors
however post-block corruption is
> not, fsck on the filesystem reports it's Healthy and 'cat' returns without error. Looking
at the DN blocks themselves,
> they clearly contain the injected corruption pattern.
> {quote}

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message