Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 45A47186CD for ; Wed, 19 Aug 2015 18:54:50 +0000 (UTC) Received: (qmail 5088 invoked by uid 500); 19 Aug 2015 18:54:47 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 5034 invoked by uid 500); 19 Aug 2015 18:54:47 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 5021 invoked by uid 99); 19 Aug 2015 18:54:46 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Aug 2015 18:54:46 +0000 Date: Wed, 19 Aug 2015 18:54:46 +0000 (UTC) From: "Hadoop QA (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703563#comment-14703563 ] Hadoop QA commented on HDFS-8828: --------------------------------- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | javac | 0m 6s | The patch appears to cause the build to fail. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12751292/HDFS-8828.010.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f61120d | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12044/console | This message was automatically generated. > Utilize Snapshot diff report to build copy list in distcp > --------------------------------------------------------- > > Key: HDFS-8828 > URL: https://issues.apache.org/jira/browse/HDFS-8828 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp, snapshots > Reporter: Yufei Gu > Assignee: Yufei Gu > Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, HDFS-8828.003.patch, HDFS-8828.004.patch, HDFS-8828.005.patch, HDFS-8828.006.patch, HDFS-8828.007.patch, HDFS-8828.008.patch, HDFS-8828.009.patch, HDFS-8828.010.patch > > > Some users reported huge time cost to build file copy list in distcp. (30 hours for 1.6M files). We can leverage snapshot diff report to build file copy list including files/dirs which are changes only between two snapshots (or a snapshot and a normal dir). It speed up the process in two folds: 1. less copy list building time. 2. less file copy MR jobs. > HDFS snapshot diff report provide information about file/directory creation, deletion, rename and modification between two snapshots or a snapshot and a normal directory. HDFS-7535 synchronize deletion and rename, then fallback to the default distcp. So it still relies on default distcp to building complete list of files under the source dir. This patch only puts creation and modification files into the copy list based on snapshot diff report. We can minimize the number of files to copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)