hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3054) distcp -skipcrccheck has no effect
Date Sat, 01 Sep 2012 01:12:07 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446569#comment-13446569
] 

Colin Patrick McCabe commented on HDFS-3054:
--------------------------------------------

I confirmed through manual testing that -skipcrccheck does indeed cause the crc checking paths
to be bypassed.

However, I found this in this code, in {{DistCpUtils#checksumsAreEquals}}:
{code}
    try {
      sourceChecksum = sourceFS.getFileChecksum(source);
      targetChecksum = targetFS.getFileChecksum(target);
    } catch (IOException e) {
      LOG.error("Unable to retrieve checksum for " + source + " or " + target, e);
    }
{code} 

I think this should be a fatal error for the distcp operation unless {{-skipcrccheck}} is
set.  Silently ignoring checksums if we can't find them doesn't seem like a good behavior.
 Perhaps we should open a different JIRA for that, though...
                
> distcp -skipcrccheck has no effect
> ----------------------------------
>
>                 Key: HDFS-3054
>                 URL: https://issues.apache.org/jira/browse/HDFS-3054
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: tools
>    Affects Versions: 0.23.2, 2.0.0-alpha, 2.0.1-alpha, 2.2.0-alpha
>            Reporter: patrick white
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-3054.002.patch, hdfs-3054.patch
>
>
> Using distcp with '-skipcrccheck' still seems to cause CRC checksums to happen. 
> Ran into this while debugging an issue associated with source and destination having
different blocksizes, and not using the preserve blocksize parameter (-pb). In both 23.1 and
23.2 builds, trying to bypass the checksum verification by using the '-skipcrcrcheck' parameter
had no effect, the distcp still failed on checksum errors.
> Test scenario to reproduce;
> do not use '-pb' and try a distcp from 20.205 (default blksize=128M) to .23 (default
blksize=256M), the distcp fails on checksum errors, which is expected due to checksum calculation
(tiered aggregation of all blks). Trying the same distcp only providing '-skipcrccheck' still
fails with the same checksum error, it is expected that checksum would now be bypassed and
the distcp would proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message