hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sriram Rao (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1111) getCorruptFiles() should give some hint that the list is not complete
Date Sat, 31 Jul 2010 06:00:18 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894252#action_12894252

Sriram Rao commented on HDFS-1111:

The case for paging was made by you (?) in one of the JIRAs on this issue.  You went looking
for list of files in an important dir and found that the 500 limit was getting in the way.

The patch that you have done has the namenode doing the filtering (and this has caused problems).

What we are proposing instead, is to have the namenode return a list of corrupt files to the
client and then let the client do the filtering.  The way we envision using this feature is
via an iterative approach to fixing corruption:
1. get a list of corrupt files for a certain path 
2. fix up the corrupt files in that path
3. iterate; stop if the list of corrupt files is empty

By being iterative, this proposal also addresses one of the issues you had brought up: namely,
the list of corrupt files can change between successive paging calls.  

Fsck is a fall-back.  With PBs that we have in our clusters, a full Fsck does take a few hours
to finish.

> getCorruptFiles() should give some hint that the list is not complete
> ---------------------------------------------------------------------
>                 Key: HDFS-1111
>                 URL: https://issues.apache.org/jira/browse/HDFS-1111
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Rodrigo Schmidt
>            Assignee: Rodrigo Schmidt
>         Attachments: HADFS-1111.0.patch
> If the list of corruptfiles returned by the namenode doesn't say anything if the number
of corrupted files is larger than the call output limit (which means the list is not complete).
There should be a way to hint incompleteness to clients.
> A simple hack would be to add an extra entry to the array returned with the value null.
Clients could interpret this as a sign that there are other corrupt files in the system.
> We should also do some rephrasing of the fsck output to make it more confident when the
list is not complete and less confident when the list is known to be incomplete.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message