hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Cwik (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14698) Make copyFromLocal's -t option available for put as well
Date Thu, 12 Apr 2018 20:58:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16436296#comment-16436296

Jason Cwik commented on HADOOP-14698:

As mentioned above in https://issues.apache.org/jira/browse/HADOOP-14698?focusedCommentId=16107552&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16107552 the
current threading model only works for the leaf nodes.  In deep/wide tree structures, the
enumeration can take a significant amount of time itself, especially when using other FileSystem
implementations like S3A or other object store connectors.  I started a patch in HDFS-13398
to address this (especially for `ls` or `du` commands) but it could likely be combined with
this effort to parallelize the FsShell module in general.

So far, we've tried two approaches.  The first simply creates another executor in the base
class and enqueues the child operations in processPaths.  The second approach uses ForkJoinPool
to crawl the tree and process subtrees in parallel.  Currently, we have FJP working with `ls`
and `du`, but not other operations.  I think that FJP is the best route since we could do
things like wait to delete a directory until all its children have been deleted, but in order
to do this properly it might require a significant refactoring of the whole FsShell module
to implement the correct ForkJoinTask structure.


> Make copyFromLocal's -t option available for put as well
> --------------------------------------------------------
>                 Key: HADOOP-14698
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14698
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Andras Bokor
>            Assignee: Andras Bokor
>            Priority: Major
>         Attachments: HADOOP-14698.01.patch, HADOOP-14698.02.patch, HADOOP-14698.03.patch,
HADOOP-14698.04.patch, HADOOP-14698.05.patch, HADOOP-14698.06.patch, HADOOP-14698.07.patch,
> After HDFS-11786 copyFromLocal and put are no longer identical.
> I do not see any reason why not to add the new feature to put as well.
> Being non-identical makes the understanding/usage of command more complicated from user
point of view.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message