hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo (Nicholas), SZE (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (HDFS-387) Corrupted blocks leading to job failures
Date Wed, 02 Mar 2011 23:59:36 GMT

     [ https://issues.apache.org/jira/browse/HDFS-387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tsz Wo (Nicholas), SZE resolved HDFS-387.
-----------------------------------------

    Resolution: Not A Problem

Closing.  Please feel free to reopen this if this is still a problem.

> Corrupted blocks leading to job failures
> ----------------------------------------
>
>                 Key: HDFS-387
>                 URL: https://issues.apache.org/jira/browse/HDFS-387
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Christian Kunz
>
> On one of our clusters we ended up with 11 singly-replicated corrupted blocks (checksum
errors) such that jobs were failing because of no live blocks available.
> fsck reports the system as healthy, although it is not.
> I argue that fsck should have an option to check whether under-replicated blocks are
okay.
> Even better, the namenode should automatically check under-replicated blocks with repeated
replication failures for corruption and list them somewhere on the GUI. And for checksum errors,
there should be an option to undo the corruption and recompute the checksums.
> Question: Is it at all probable that two or more replications of a block have checksum
errors? If not, then we could reduce the checking to singly-replicated blocks.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message