hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3998) Speed up fsck
Date Wed, 03 Oct 2012 08:14:08 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13468402#comment-13468402
] 

Ming Ma commented on HDFS-3998:
-------------------------------

Ha, thanks Todd. Our big clusters are using hadoop 1.0 equivalent and it doesn't have this
feature. In hadoop 2.0, listCorruptFileBlocks API is added and both corrupt_files.jsp and
"fsck -listcorruptfileblocks" can provide most of what I want.

1. Being able to retrieve corrupt blocks from NN sounds like an useful feature to port to
hadoop 1.0.
2. Block with one replica is still bad; It will be nice to distinguish that from the rest
of under replicated blocks.

Comments?
                
> Speed up fsck
> -------------
>
>                 Key: HDFS-3998
>                 URL: https://issues.apache.org/jira/browse/HDFS-3998
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: Ming Ma
>
> We have some big clusters. Sometimes we want to find out the list of missing blocks or
blocks with only one replica quickly. Currently fsck has to take a path as input and it then
recursively check for inconsistency. That could take a long time to find the missing blocks
and the files the missing blocks belong to. It will be useful to speed this up. For example,
it could go directly to missing blocks stored in NN and do the file lookup instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message