hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1111) getCorruptFiles() should give some hint that the list is not complete
Date Thu, 29 Jul 2010 00:54:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893457#action_12893457

Konstantin Shvachko commented on HDFS-1111:

There are several things that went wrong with HDFS-729. And I think this jira should take
care of them.
# API for getting corrupt files is incomplete.
#- There is no way to tell whether corrupt files returned by fsck are all corrupt files or
there is more. This jira is directly addressing the problem.
#- If there is more, there is no way to receive the exhaustive list of corrupt files by repetitive
calling getCorruptFiles(). This is the next question people ask in regard to this feature.
#- Filtering over a sub-tree. The subject of HDFS-1265.
# Synchronization problem. HDFS-729 introduced {{UnderReplicatedBlocks.getQueue(level)}},
which returns a pointer to an internal queue, which opens way for unsynchronized usage of
this collection.
# Unnecessary {{ClientProtocol}} changes. I still prefer to think that the changes were just
a mistake rather than an attempt to silently squeeze in some changes for external tools. One
way or an other the wire protocol changes should have an important use case. Right now you
did not make a clear case for that, as Sanjay pointed out.

> getCorruptFiles() should give some hint that the list is not complete
> ---------------------------------------------------------------------
>                 Key: HDFS-1111
>                 URL: https://issues.apache.org/jira/browse/HDFS-1111
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Rodrigo Schmidt
>            Assignee: Rodrigo Schmidt
>         Attachments: HADFS-1111.0.patch
> If the list of corruptfiles returned by the namenode doesn't say anything if the number
of corrupted files is larger than the call output limit (which means the list is not complete).
There should be a way to hint incompleteness to clients.
> A simple hack would be to add an extra entry to the array returned with the value null.
Clients could interpret this as a sign that there are other corrupt files in the system.
> We should also do some rephrasing of the fsck output to make it more confident when the
list is not complete and less confident when the list is known to be incomplete.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message