hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
Date Wed, 07 Mar 2018 18:49:00 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Steve Loughran updated HADOOP-15273:
    Status: Patch Available  (was: Open)

Patch 001

* allows -skipcrccheck everywhere
* when the filesystem schemas are different not the hdfs ones (hdfs, webhdfs, swebhdfs) then
a filesystem message is printed instead of one about block size
* error message adds \n formatting
* and the correct name of the option to disable the checks

Tests: not easily. Maybe after HADOOP-15209 is in I could do it...we'd need something in hadoop-aws
with a minihdfs cluster. This is not an easy undertaking.

I have manually tested it & verified that yes, the skipcrc goes down.

Even with this patch, I'm wondering whether its best to revert the s3a etag feature until
we have distcp better able to cope

> distcp can't handle remote stores with different checksum algorithms
> --------------------------------------------------------------------
>                 Key: HADOOP-15273
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15273
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: tools/distcp
>    Affects Versions: 3.1.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Critical
>         Attachments: HADOOP-15273-001.patch
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch between
src and dest store types (e.g hdfs to s3), then the error message will talk about blocksize,
even when its the underlying checksum protocol itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes during copy.
Alternatively, skip checksum-checks altogether, using -skipCrc. (NOTE: By skipping checksums,
one runs the risk of masking data-corruption during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file is renamed
into place. *and you can't disable it then*

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message