hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7312) Update DistCp v1 to optionally not use tmp location
Date Wed, 12 Nov 2014 23:44:35 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208914#comment-14208914
] 

Yongjun Zhang commented on HDFS-7312:
-------------------------------------

Hi [~jprosser],

Thanks for the nice work here. I looked through, it looks good, except for a couple of improvements
and nits:

Improvements:

* For the following new test created, to achieve better code sharing
{code}
 /** copy files from dfs file system to dfs file system with skiptmp */
  public void testCopyFromDfsToDfsWithSkiptmp() throws Exception {
{code}
suggest to create a new private method with an additional boolean parameter skipTmp, {{private
void testCopyFromDfsToDfs(final boolean skipTmp}}. Then you can make the pre-existing test
testCopyFromDfsToDfs and your new test to call this private method with false and true accordingly.

* This same above suggestion applies to {{testCopySingleFileWithSkiptmp()}}.

* Line 1221, do we really need to ceate the tmpDir here? For example, if we copy a single
file to a destination file, why a tmpDir is needed? I think we can avoid doing this (removing
that block of code), and have a file-existence checking when doing the fullyDelete. The point
is, the TMP_DIR_LABEL may not always be created, we don't have to create it just for the purpose
of deleting it. Right?

Nits:

* About usage description "Copy files directly to the final destination.", maybe we can change
it to "Instead, copy files directly to the final destination." ?
* Line 394, "// filename and the path becomes its parent directory.", replace "path" with
"destPath"
* Line 1218, rename "tmpDirRoot" to "tmpDirPrefix"?
* Line 396, tab key not replaced with spaces. 
* line 400  remove newly added extra newline
* Line 956, replace "The job to configure" with "The job configuration"
* Line 1218, line is too long, need to be <= 80 columns based on hadoop code guideline;

BTW, I guess you have tried to test it out in real clusters, right?

Thanks a lot.


> Update DistCp v1 to optionally not use tmp location
> ---------------------------------------------------
>
>                 Key: HDFS-7312
>                 URL: https://issues.apache.org/jira/browse/HDFS-7312
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: tools
>    Affects Versions: 2.5.1
>            Reporter: Joseph Prosser
>            Assignee: Joseph Prosser
>            Priority: Minor
>         Attachments: HDFS-7312.001.patch, HDFS-7312.002.patch, HDFS-7312.003.patch, HDFS-7312.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> DistCp v1 currently copies files to a tmp location and then renames that to the specified
destination.  This can cause performance issues on filesystems such as S3.  A -skiptmp flag
will be added to bypass this step and copy directly to the destination.  This feature mirrors
a similar one added to HBase ExportSnapshot [HBASE-11119|https://issues.apache.org/jira/browse/HBASE-11119]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message