Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7EDC8C4DF for ; Wed, 12 Nov 2014 23:44:35 +0000 (UTC) Received: (qmail 41587 invoked by uid 500); 12 Nov 2014 23:44:35 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 41535 invoked by uid 500); 12 Nov 2014 23:44:35 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 41521 invoked by uid 99); 12 Nov 2014 23:44:35 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Nov 2014 23:44:35 +0000 Date: Wed, 12 Nov 2014 23:44:35 +0000 (UTC) From: "Yongjun Zhang (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-7312) Update DistCp v1 to optionally not use tmp location MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208914#comment-14208914 ] Yongjun Zhang commented on HDFS-7312: ------------------------------------- Hi [~jprosser], Thanks for the nice work here. I looked through, it looks good, except for a couple of improvements and nits: Improvements: * For the following new test created, to achieve better code sharing {code} /** copy files from dfs file system to dfs file system with skiptmp */ public void testCopyFromDfsToDfsWithSkiptmp() throws Exception { {code} suggest to create a new private method with an additional boolean parameter skipTmp, {{private void testCopyFromDfsToDfs(final boolean skipTmp}}. Then you can make the pre-existing test testCopyFromDfsToDfs and your new test to call this private method with false and true accordingly. * This same above suggestion applies to {{testCopySingleFileWithSkiptmp()}}. * Line 1221, do we really need to ceate the tmpDir here? For example, if we copy a single file to a destination file, why a tmpDir is needed? I think we can avoid doing this (removing that block of code), and have a file-existence checking when doing the fullyDelete. The point is, the TMP_DIR_LABEL may not always be created, we don't have to create it just for the purpose of deleting it. Right? Nits: * About usage description "Copy files directly to the final destination.", maybe we can change it to "Instead, copy files directly to the final destination." ? * Line 394, "// filename and the path becomes its parent directory.", replace "path" with "destPath" * Line 1218, rename "tmpDirRoot" to "tmpDirPrefix"? * Line 396, tab key not replaced with spaces. * line 400 remove newly added extra newline * Line 956, replace "The job to configure" with "The job configuration" * Line 1218, line is too long, need to be <= 80 columns based on hadoop code guideline; BTW, I guess you have tried to test it out in real clusters, right? Thanks a lot. > Update DistCp v1 to optionally not use tmp location > --------------------------------------------------- > > Key: HDFS-7312 > URL: https://issues.apache.org/jira/browse/HDFS-7312 > Project: Hadoop HDFS > Issue Type: Improvement > Components: tools > Affects Versions: 2.5.1 > Reporter: Joseph Prosser > Assignee: Joseph Prosser > Priority: Minor > Attachments: HDFS-7312.001.patch, HDFS-7312.002.patch, HDFS-7312.003.patch, HDFS-7312.patch > > Original Estimate: 72h > Remaining Estimate: 72h > > DistCp v1 currently copies files to a tmp location and then renames that to the specified destination. This can cause performance issues on filesystems such as S3. A -skiptmp flag will be added to bypass this step and copy directly to the destination. This feature mirrors a similar one added to HBase ExportSnapshot [HBASE-11119|https://issues.apache.org/jira/browse/HBASE-11119] -- This message was sent by Atlassian JIRA (v6.3.4#6332)