hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build diff copy list in distcp
Date Thu, 20 Aug 2015 15:24:46 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14705113#comment-14705113

Hudson commented on HDFS-8828:

FAILURE: Integrated in Hadoop-trunk-Commit #8328 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8328/])
HDFS-8828. Utilize Snapshot diff report to build diff copy list in distcp. (Yufei Gu via Yongjun
Zhang) (yzhang: rev 0bc15cb6e60dc60885234e01dec1c7cb4557a926)
* hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java
* hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java

> Utilize Snapshot diff report to build diff copy list in distcp
> --------------------------------------------------------------
>                 Key: HDFS-8828
>                 URL: https://issues.apache.org/jira/browse/HDFS-8828
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: distcp, snapshots
>            Reporter: Yufei Gu
>            Assignee: Yufei Gu
>             Fix For: 2.8.0
>         Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, HDFS-8828.003.patch, HDFS-8828.004.patch,
HDFS-8828.005.patch, HDFS-8828.006.patch, HDFS-8828.007.patch, HDFS-8828.008.patch, HDFS-8828.009.patch,
HDFS-8828.010.patch, HDFS-8828.011.patch
> Some users reported huge time cost to build file copy list in distcp. (30 hours for 1.6M
files). We can leverage snapshot diff report to build file copy list including files/dirs
which are changes only between two snapshots (or a snapshot and a normal dir). It speed up
the process in two folds: 1. less copy list building time. 2. less file copy MR jobs.
> HDFS snapshot diff report provide information about file/directory creation, deletion,
rename and modification between two snapshots or a snapshot and a normal directory. HDFS-7535
synchronize deletion and rename, then fallback to the default distcp. So it still relies on
default distcp to building complete list of files under the source dir. This patch only puts
creation and modification files into the copy list based on snapshot diff report. We can minimize
the number of files to copy. 

This message was sent by Atlassian JIRA

View raw message