hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "wujinhu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15262) AliyunOSS: rename() to move files in a directory in parallel
Date Mon, 05 Mar 2018 06:08:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385641#comment-16385641

wujinhu commented on HADOOP-15262:

Thanks [~Sammi] for your review comments. I have fixed from 1 to 4.

For 5, as we all know, copy operation will be inexpensive as oss will support shallow copy
soon. User can configure a higher number threads to copy files, so it is a little hard to
define the upper limit of the waiting list size(Different from pre-read configuration, because
read operations are expensive). However, though the queue is defined as unbounded queue, but
we have used SemaphoredDelegatingExecutor to limit the concurrency of one directory. 

For 6, since we read only one field of AliyunOSSCopyFileContext class, there is no need
to call lock(). Reduce the call of lock() can also improve our performance.

> AliyunOSS: rename() to move files in a directory in parallel
> ------------------------------------------------------------
>                 Key: HADOOP-15262
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15262
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/oss
>    Affects Versions: 3.0.0
>            Reporter: wujinhu
>            Assignee: wujinhu
>            Priority: Major
>             Fix For: 3.1.0, 2.9.1, 3.0.1
>         Attachments: HADOOP-15262.001.patch, HADOOP-15262.002.patch, HADOOP-15262.003.patch,
> Currently, rename() operation renames files in series. This will be slow if a directory
contains many files. So we can improve this by rename files in parallel.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message