hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
Date Wed, 07 Mar 2018 21:46:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16390291#comment-16390291
] 

Steve Loughran commented on HADOOP-15273:
-----------------------------------------

Patch 003
* fixes checkstyle
* fixes tests

With HADOOP-15297 making the etags => checksum feature in s3a optional, this isn't quite
a blocker, but it is when you try to distcp between any two stores with different algorithms,
because only -update lets you skip the checks right now. If any other FS offers checksums,
things will break

> distcp can't handle remote stores with different checksum algorithms
> --------------------------------------------------------------------
>
>                 Key: HADOOP-15273
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15273
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: tools/distcp
>    Affects Versions: 3.1.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Critical
>         Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, HADOOP-15273-003.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch between
src and dest store types (e.g hdfs to s3), then the error message will talk about blocksize,
even when its the underlying checksum protocol itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes during copy.
Alternatively, skip checksum-checks altogether, using -skipCrc. (NOTE: By skipping checksums,
one runs the risk of masking data-corruption during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file is renamed
into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message