hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "LiuLei (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6491) proposal for developing a tool to compare files/dirs
Date Fri, 06 Jun 2014 06:56:01 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14019629#comment-14019629

LiuLei commented on HDFS-6491:

I think we can generate md5 checksum for block file of HDFS File, then compare the md5 checksum.
 If md5 checksum are same in two HDFS Clusters, the HDFS file content are same.

> proposal for developing a tool to compare files/dirs
> ----------------------------------------------------
>                 Key: HDFS-6491
>                 URL: https://issues.apache.org/jira/browse/HDFS-6491
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: tools
>    Affects Versions: 2.4.0
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
> We have a tool distcp that copy files similar to unix/linux cp command but copy files
in a distributed way, but we don't have a tool to compare files/dirs. I think to provide such
a tool would be helpful.  We can name it distdiff to be consistent with distcp.
> Right now I'm thinking about providing some basic functionality as a starting point,
and we can add more features or add performance improvement later.
> I had opportunity to discuss this with [~daryn] and [~szetszwo] in person. Thanks both
of them a lot for the very valuable inputs.

This message was sent by Atlassian JIRA

View raw message