hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yufei Gu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp
Date Tue, 28 Jul 2015 07:29:04 GMT
Yufei Gu created HDFS-8828:
------------------------------

             Summary: Utilize Snapshot diff report to build copy list in distcp
                 Key: HDFS-8828
                 URL: https://issues.apache.org/jira/browse/HDFS-8828
             Project: Hadoop HDFS
          Issue Type: Improvement
            Reporter: Yufei Gu
            Assignee: Yufei Gu


Some users reported huge time cost to build file copy list in distcp. (30 hours with 1.6M
files). We can leverage snapshot diff report to build file copy list including files/dirs
which are changes only between two snapshots (or a snapshot and a normal dir). It speed up
the process in two folds: 1. less copy list building time. 2. less file copy MR jobs.

HDFS snapshot diff report provide information about file/directory creation, deletion, rename
and modification between two snapshots or a snapshot and a normal directory. HDFS-7535 synchronize
deletion and rename, the fallback to the default distcp. So it still relies on default distcp
to building copy list which will traverse all files under the source dir. This patch will
build the copy list based on snapshot diff report. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message