hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3054) distcp -skipcrccheck has no effect
Date Wed, 22 Aug 2012 01:14:38 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439207#comment-13439207
] 

Colin Patrick McCabe commented on HDFS-3054:
--------------------------------------------

Hi Rahul,
This patch looks good to me overall.  I created a rebased version of it that incorporates
changes from trunk.

I wish we could have a unit test for this.  It's a little difficult to create a good one without
getting kind of uncomfortably coupled to the HDFS code (after all this is in tools).  We would
probably have to create an API to corrupt file checksums in HDFS, and use that.  Then we could
be sure that distcp -skipcrccheck was not consulting the CRC.  I'm not sure if this is worth
the extra work, though... thoughts?
                
> distcp -skipcrccheck has no effect
> ----------------------------------
>
>                 Key: HDFS-3054
>                 URL: https://issues.apache.org/jira/browse/HDFS-3054
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: tools
>    Affects Versions: 0.23.2, 2.0.0-alpha, 2.0.1-alpha, 2.2.0-alpha
>            Reporter: patrick white
>         Attachments: HDFS-3054.002.patch, hdfs-3054.patch
>
>
> Using distcp with '-skipcrccheck' still seems to cause CRC checksums to happen. 
> Ran into this while debugging an issue associated with source and destination having
different blocksizes, and not using the preserve blocksize parameter (-pb). In both 23.1 and
23.2 builds, trying to bypass the checksum verification by using the '-skipcrcrcheck' parameter
had no effect, the distcp still failed on checksum errors.
> Test scenario to reproduce;
> do not use '-pb' and try a distcp from 20.205 (default blksize=128M) to .23 (default
blksize=256M), the distcp fails on checksum errors, which is expected due to checksum calculation
(tiered aggregation of all blks). Trying the same distcp only providing '-skipcrccheck' still
fails with the same checksum error, it is expected that checksum would now be bypassed and
the distcp would proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message