hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8126) hadoop fsck does not correctly check for corrupt blocks for a file
Date Fri, 10 Apr 2015 17:20:12 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14489954#comment-14489954

Chris Nauroth commented on HDFS-8126:

Hello [~pradeep_bhadani].  It sounds like this is working as designed.

The fsck command does not directly verify the checksum of each block.  This would be an extremely
costly operation in a large cluster with many blocks.  Instead, checksum verification occurs
asynchronous of the fsck command.  This can happen either when a client attempts to read the
block (as you observed in your test case) or the background block scanner thread running on
each DataNode detects an invalid checksum.  The corrupt status is then recorded in the NameNode.
 When the fsck command runs, it's just reporting on the state of known corrupt blocks from
the NameNode.

Under default configuration, the background block scanner thread does its own checksum verification
on a block every 3 weeks, so in the test case you described, the block scanner would not have
had an opportunity to detect this.  However, when you attempted to read the file, this triggered
an immediate checksum verification during the read, which then reported the checksum failure
to the NameNode.

If this explanation makes sense, would you please close the issue?  If you still think you're
seeing a bug in fsck, can you provide more details?  Thanks!

> hadoop fsck does not correctly check for corrupt blocks for a file
> ------------------------------------------------------------------
>                 Key: HDFS-8126
>                 URL: https://issues.apache.org/jira/browse/HDFS-8126
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: HDFS, hdfs-client
>    Affects Versions: 2.3.0
>            Reporter: Pradeep Bhadani
> hadoop fsck does not correctly check for corrupt blocks for a file until we try to read
that file.
> Test steps (Followed on Cloudera CDH5.1 single node VM and Hortonworks HDP2.2 single
node VM ) : 
> 1. Uploaded a files "test.txt" to /user/abc/test.txt on HDFS
> 2. Ran "hadoop fsck  /user/abc/test.txt -files -blocks " command to check file integrity
and retrieve block id.
> 3. Search for the block file location  at linux filesystem level.
> 4. Manually edit the block file.
> 5. Re-run the fsck command "hadoop fsck /user/abc/test.txt".
> 6. At this stage , FSCK still shows that files in HEALTHY state.
> 7. Waited for more than 30 sec to re-run FSCK test and still shows healthy state.
> 8. Try to read file "hadoop fs -cat /user/abc/test.txt" . Thsi command failes with an
error of mis-match in checksum (as expected).
> 9. re-run FSCK. Now FSCK show that 1 block is corrupt.
> 10. Manually edit the file and restore to previous state.
> 11. Try to cat file. It works.
> 12. Run FSCK test. Still fails

This message was sent by Atlassian JIRA

View raw message