hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1305) Massive performance problem with DistCp and -delete
Date Wed, 23 Dec 2009 15:12:29 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794068#action_12794068
] 

Hadoop QA commented on MAPREDUCE-1305:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12428316/MAPREDUCE-1305.patch
  against trunk revision 893469.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit
warnings.

    +1 core tests.  The patch passed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/241/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/241/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/241/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/241/console

This message is automatically generated.

> Massive performance problem with DistCp and -delete
> ---------------------------------------------------
>
>                 Key: MAPREDUCE-1305
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1305
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distcp
>    Affects Versions: 0.20.1
>            Reporter: Peter Romianowski
>            Assignee: Peter Romianowski
>         Attachments: MAPREDUCE-1305.patch
>
>
> *First problem*
> In org.apache.hadoop.tools.DistCp#deleteNonexisting we serialize FileStatus objects when
the path is all we need.
> The performance problem comes from org.apache.hadoop.fs.RawLocalFileSystem.RawLocalFileStatus#write
which tries to retrieve file permissions by issuing a "ls -ld <path>" which is painfully
slow.
> Changed that to just serialize Path and not FileStatus.
> *Second problem*
> To delete the files we invoke the "hadoop" command line tool with option "-rmr <path>".
Again, for each file.
> Changed that to dstfs.delete(path, true)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message