hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zoran Dimitrijevic (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-1540) distcp should support an exclude list
Date Fri, 08 May 2015 20:07:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535403#comment-14535403
] 

Zoran Dimitrijevic commented on HADOOP-1540:
--------------------------------------------

#5: we were experiencing performance issues for large number of files only because of RPCs
to either namenode or to s3. Filtering each file name locally using a small number of compiled
regex or glob rules should not be a big deal, especially since it's optional. For example,
sorting a big filelist that we do now is much more expensive.

Thank you for your patch!

> distcp should support an exclude list
> -------------------------------------
>
>                 Key: HADOOP-1540
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1540
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 2.6.0
>            Reporter: Senthil Subramanian
>            Assignee: Rich Haase
>            Priority: Minor
>              Labels: BB2015-05-TBR, patch
>         Attachments: HADOOP-1540.003.patch, HADOOP-1540.004.patch, HADOOP-1540.005.patch,
HADOOP-1540.006.patch
>
>
> There should be a way to ignore specific paths (eg: those that have already been copied
over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message