hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3054) distcp -skipcrccheck has no effect
Date Sat, 01 Sep 2012 01:12:07 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446569#comment-13446569

Colin Patrick McCabe commented on HDFS-3054:

I confirmed through manual testing that -skipcrccheck does indeed cause the crc checking paths
to be bypassed.

However, I found this in this code, in {{DistCpUtils#checksumsAreEquals}}:
    try {
      sourceChecksum = sourceFS.getFileChecksum(source);
      targetChecksum = targetFS.getFileChecksum(target);
    } catch (IOException e) {
      LOG.error("Unable to retrieve checksum for " + source + " or " + target, e);

I think this should be a fatal error for the distcp operation unless {{-skipcrccheck}} is
set.  Silently ignoring checksums if we can't find them doesn't seem like a good behavior.
 Perhaps we should open a different JIRA for that, though...
> distcp -skipcrccheck has no effect
> ----------------------------------
>                 Key: HDFS-3054
>                 URL: https://issues.apache.org/jira/browse/HDFS-3054
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: tools
>    Affects Versions: 0.23.2, 2.0.0-alpha, 2.0.1-alpha, 2.2.0-alpha
>            Reporter: patrick white
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-3054.002.patch, hdfs-3054.patch
> Using distcp with '-skipcrccheck' still seems to cause CRC checksums to happen. 
> Ran into this while debugging an issue associated with source and destination having
different blocksizes, and not using the preserve blocksize parameter (-pb). In both 23.1 and
23.2 builds, trying to bypass the checksum verification by using the '-skipcrcrcheck' parameter
had no effect, the distcp still failed on checksum errors.
> Test scenario to reproduce;
> do not use '-pb' and try a distcp from 20.205 (default blksize=128M) to .23 (default
blksize=256M), the distcp fails on checksum errors, which is expected due to checksum calculation
(tiered aggregation of all blks). Trying the same distcp only providing '-skipcrccheck' still
fails with the same checksum error, it is expected that checksum would now be bypassed and
the distcp would proceed.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message