hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kai Zheng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9613) Minor improvement and clean up in distcp
Date Tue, 05 Jan 2016 07:26:39 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082569#comment-15082569

Kai Zheng commented on HDFS-9613:

Hi [~yzhangal],

Thanks for your review and good question! My understanding is that, if block size is different,
the file checksum is very likely different. If we would check block size first, then we may
save some time by avoiding unnecessary file checksum computing. A corner case that I missed
in the patch is, when the file content is smaller than block size, then the logic won't work,
as indicated by the failed test case. I'm updating the patch to solve this. Does it make sense
to you? Thanks for your clarifying if I got something wrong.

> Minor improvement and clean up in distcp
> ----------------------------------------
>                 Key: HDFS-9613
>                 URL: https://issues.apache.org/jira/browse/HDFS-9613
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>            Priority: Minor
>         Attachments: HDFS-9613-v1.patch
> While working on related issue, it was noticed there are some places in {{distcp}} that's
better to be improved and cleaned up. Particularly, after a file is coped to target cluster,
it will check the copied file is fine or not. When checking, better to check block size first,
then the checksum, because the later is a little expensive.

This message was sent by Atlassian JIRA

View raw message