hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo Nicholas Sze (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7269) NN and DN don't check whether corrupted blocks reported by clients are actually corrupted
Date Tue, 21 Oct 2014 01:27:35 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177789#comment-14177789
] 

Tsz Wo Nicholas Sze commented on HDFS-7269:
-------------------------------------------

By HDFS-1371, the client should not report checksum failure when all the nodes are bad.  Do
the files have only one replica in your case?

> NN and DN don't check whether corrupted blocks reported by clients are actually corrupted
> -----------------------------------------------------------------------------------------
>
>                 Key: HDFS-7269
>                 URL: https://issues.apache.org/jira/browse/HDFS-7269
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Ming Ma
>
> We had a case where the client machine had memory issue and thus failed the checksum
validation of a given block for all its replicas. So the client ended up informing NN about
the corrupted blocks for all DNs via reportBadBlocks. However, the block isn't corrupted on
any of the DNs. You can still use DFSClient to read the block. But in order to get rid of
NN's warning message for corrupt block, we either do a NN fail over, or repair the file via
a) copy the file somewhere, b) remove the file, c) copy the file back.
> It will be useful if NN and DN can validate client's report. In fact, there is a comment
in NamenodeRpcServer about this.
> {noformat}
>   /**
>    * The client has detected an error on the specified located blocks 
>    * and is reporting them to the server.  For now, the namenode will 
>    * mark the block as corrupt.  In the future we might 
>    * check the blocks are actually corrupt. 
>    */
> {noformat}
> To allow system to recover from invalid client report quickly, we can support automatic
recovery or manual admins command.
> 1. we can have NN send a new DatanodeCommand like ValidateBlockCommand. DN will notify
the validate result via IBR and new ReceivedDeletedBlockInfo.BlockStatus.VALIDATED_BLOCK.
> 2. Some new admins command to move corrupted blocks out of BM's CorruptReplicasMap and
UnderReplicatedBlocks.
> Appreciate any input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message