Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 3DC492007D1 for ; Thu, 12 May 2016 23:57:15 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 3C42F160939; Thu, 12 May 2016 21:57:15 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 83799160A10 for ; Thu, 12 May 2016 23:57:14 +0200 (CEST) Received: (qmail 2678 invoked by uid 500); 12 May 2016 21:57:13 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 2363 invoked by uid 99); 12 May 2016 21:57:13 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 May 2016 21:57:13 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 0A5FB2C1F6E for ; Thu, 12 May 2016 21:57:13 +0000 (UTC) Date: Thu, 12 May 2016 21:57:13 +0000 (UTC) From: "Jitendra Nath Pandey (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build diff copy list in distcp MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 12 May 2016 21:57:15 -0000 [ https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15282125#comment-15282125 ] Jitendra Nath Pandey commented on HDFS-8828: -------------------------------------------- +1 to the proposal. '-delete' and '-diff' should have been mutually exclusive, but instead of throwing an exception ignoring "-delete" sounds ok to prevent existing apps from failing. > Utilize Snapshot diff report to build diff copy list in distcp > -------------------------------------------------------------- > > Key: HDFS-8828 > URL: https://issues.apache.org/jira/browse/HDFS-8828 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp, snapshots > Reporter: Yufei Gu > Assignee: Yufei Gu > Fix For: 2.8.0 > > Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, HDFS-8828.003.patch, HDFS-8828.004.patch, HDFS-8828.005.patch, HDFS-8828.006.patch, HDFS-8828.007.patch, HDFS-8828.008.patch, HDFS-8828.009.patch, HDFS-8828.010.patch, HDFS-8828.011.patch > > > Some users reported huge time cost to build file copy list in distcp. (30 hours for 1.6M files). We can leverage snapshot diff report to build file copy list including files/dirs which are changes only between two snapshots (or a snapshot and a normal dir). It speed up the process in two folds: 1. less copy list building time. 2. less file copy MR jobs. > HDFS snapshot diff report provide information about file/directory creation, deletion, rename and modification between two snapshots or a snapshot and a normal directory. HDFS-7535 synchronize deletion and rename, then fallback to the default distcp. So it still relies on default distcp to building complete list of files under the source dir. This patch only puts creation and modification files into the copy list based on snapshot diff report. We can minimize the number of files to copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org