hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-101) DFSck - fsck-like utility for checking DFS volumes
Date Fri, 24 Mar 2006 00:56:19 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-101?page=comments#action_12371659 ] 

Doug Cutting commented on HADOOP-101:

I like that this does not use anything more than the client API to check the server.  That
keeps the server core lean and mean.  The use of RPC's effectively restricts the impact of
the scan on the FS.

A datanode operation that streams through a block without transferring it over the wire won't
correctly check checksums using our existing mechanism.  To check file content we could instead
simply implement a map-reduce job that streams through all the files in the fs.  This would
not take much code: nothing additional in the core.  MapReduce should handle the locality,
so that most data shouldn't go over the wire.

BTW, blocks not used by any file are not known to the name node, are they?  When they're reported
by a datanode the datanode is told to remove them.

> DFSck - fsck-like utility for checking DFS volumes
> --------------------------------------------------
>          Key: HADOOP-101
>          URL: http://issues.apache.org/jira/browse/HADOOP-101
>      Project: Hadoop
>         Type: New Feature
>   Components: dfs
>     Versions: 0.2
>     Reporter: Andrzej Bialecki 
>     Assignee: Andrzej Bialecki 
>  Attachments: DFSck.java
> This is a utility to check health status of a DFS volume, and collect some additional

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message