hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9613) Minor improvement and clean up in distcp
Date Tue, 05 Jan 2016 07:20:39 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082559#comment-15082559

Yongjun Zhang commented on HDFS-9613:

HI [~drankye],

Thanks for your work here. I have a question, my understanding is that, if the target and
source clusters have different default block size, without -pb switch, the block size of source
and target file can be different, but the following code has the requirement that the block
size matches, and suggests to use -pb switch. If that's intended, may I know why?

206	    if (sourceFS.getFileStatus(source).getBlockSize() !=
207	        targetFS.getFileStatus(target).getBlockSize()) {
208	      errorMessage.append(" Source and target differ in block-size.")
209	        .append(" Use -pb to preserve block-sizes during copy.");
210	      checkFailed = true;
211	    } else if (!DistCpUtils.checksumsAreEqual(sourceFS, source,


> Minor improvement and clean up in distcp
> ----------------------------------------
>                 Key: HDFS-9613
>                 URL: https://issues.apache.org/jira/browse/HDFS-9613
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>            Priority: Minor
>         Attachments: HDFS-9613-v1.patch
> While working on related issue, it was noticed there are some places in {{distcp}} that's
better to be improved and cleaned up. Particularly, after a file is coped to target cluster,
it will check the copied file is fine or not. When checking, better to check block size first,
then the checksum, because the later is a little expensive.

This message was sent by Atlassian JIRA

View raw message