hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Collins (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3889) distcp overwrites files even when there are missing checksums
Date Wed, 05 Sep 2012 02:06:07 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448393#comment-13448393
] 

Eli Collins commented on HDFS-3889:
-----------------------------------

Good find.  Or perhaps have an option that checks CRCs but just logs. I imagine the motivation
for this was to not stop a large distcp job because one call to getFileChecksum failed (though
it's robust, eg checks multiple DNs, so that should probably be rare).
                
> distcp overwrites files even when there are missing checksums
> -------------------------------------------------------------
>
>                 Key: HDFS-3889
>                 URL: https://issues.apache.org/jira/browse/HDFS-3889
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: tools
>    Affects Versions: 2.2.0-alpha
>            Reporter: Colin Patrick McCabe
>            Priority: Minor
>
> If distcp can't read the checksum files for the source and destination files-- for any
reason-- it ignores the checksums and overwrites the destination file.  It does produce a
log message, but I think the correct behavior would be to throw an error and stop the distcp.
> If the user really wants to ignore checksums, he or she can use {{-skipcrccheck}} to
do so.
> The relevant code is in DistCpUtils#checksumsAreEquals:
> {code}
>     try {
>       sourceChecksum = sourceFS.getFileChecksum(source);
>       targetChecksum = targetFS.getFileChecksum(target);
>     } catch (IOException e) {
>       LOG.error("Unable to retrieve checksum for " + source + " or " + target, e);
>     }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message