hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13145) In DistCp, prevent unnecessary getFileStatus call when not preserving metadata.
Date Wed, 18 May 2016 19:50:13 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15289674#comment-15289674
] 

Steve Loughran commented on HADOOP-13145:
-----------------------------------------

tested -003 against s3 ireland and azure.

{code}
Running org.apache.hadoop.fs.contract.s3a.TestS3AContractDistCp
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 223.843 sec - in org.apache.hadoop.fs.contract.s3a.TestS3AContractDistCp

...
Running org.apache.hadoop.fs.azure.contract.TestAzureNativeContractDistCp
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 65.354 sec - in org.apache.hadoop.fs.azure.contract.TestAzureNativeContractDistCp

{code}

Interesting how much faster azure is. 

The patch, is, as it stands, it's going to add 4 min to a TestS3A* test pattern. Could it
be made one of the scaleable tests where it takes a config of option on scale so can be made
configurable? There are already some tests which use {{scale.test.operation.count}} to control
scale; we could have one on distcp file size, with the large file size being driven by it.
Make it something in KB and it could easily be tuned for those of us in a different country
from an S3 endpoint.

> In DistCp, prevent unnecessary getFileStatus call when not preserving metadata.
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-13145
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13145
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: tools/distcp
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>         Attachments: HADOOP-13145.001.patch, HADOOP-13145.003.patch
>
>
> After DistCp copies a file, it calls {{getFileStatus}} to get the {{FileStatus}} from
the destination so that it can compare to the source and update metadata if necessary.  If
the DistCp command was run without the option to preserve metadata attributes, then this additional
{{getFileStatus}} call is wasteful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message