hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1111) getCorruptFiles() should give some hint that the list is not complete
Date Mon, 16 Aug 2010 17:19:19 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898994#action_12898994

dhruba borthakur commented on HDFS-1111:

There is an application (RaidNode) that needs to detect missing blocks and fix them as soon
as possible. The design is such that it should find missing blocks within minutes (rather
than hours).

1. The getCorruptFiles() method satisfies this goal perfectly. This can be enhanced to add
a method in DistrbutedFileSystem to expose this API in a more formal fashion.

2. another alternative would be to remove NameNode.getCorruptFiles() and add it to fsck.

3. An alternative would be to introduce a callback mechanism to be registered with the NN,
this callback is invoked when the NN detects missing blocks.

I prefer approach 1 because it is more direct and consumes less resources than option 2. 
Option 3 is very heavyweight. 

> getCorruptFiles() should give some hint that the list is not complete
> ---------------------------------------------------------------------
>                 Key: HDFS-1111
>                 URL: https://issues.apache.org/jira/browse/HDFS-1111
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Rodrigo Schmidt
>            Assignee: Sriram Rao
>         Attachments: HADFS-1111.0.patch, HDFS-1111-y20.1.patch, HDFS-1111-y20.2.patch
> If the list of corruptfiles returned by the namenode doesn't say anything if the number
of corrupted files is larger than the call output limit (which means the list is not complete).
There should be a way to hint incompleteness to clients.
> A simple hack would be to add an extra entry to the array returned with the value null.
Clients could interpret this as a sign that there are other corrupt files in the system.
> We should also do some rephrasing of the fsck output to make it more confident when the
list is not complete and less confident when the list is known to be incomplete.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message