hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Abraham Fine (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14757) S3AFileSystem.innerRename() to size metadatastore lists better
Date Thu, 25 Jan 2018 18:22:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16339590#comment-16339590

Abraham Fine commented on HADOOP-14757:

[~stevel@apache.org] I'm not sure how much value we would get from doing something like this
relative to the amount of work involved. The single file copy case is trivial but the second
case gets a bit more complicated.

My guess is that this ticket refers to the {{dstMetas}} list created in {{innerRename}}.

We have multiple layers of iterators in play here and we would need to bubble up a guess for
the size size of the list which could be quite ugly.

I think another possible option would be to use a Linked List since we do not do any random
access but any benchmarks I found for ArrayList vs LinkedList performance showed minimal to
know gains and occasional slowdowns. I think the best action here may be no action. What do
you think?

> S3AFileSystem.innerRename() to size metadatastore lists better
> --------------------------------------------------------------
>                 Key: HADOOP-14757
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14757
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.0.0-beta1
>            Reporter: Steve Loughran
>            Assignee: Abraham Fine
>            Priority: Minor
>             Fix For: 3.1.0
> In {{S3AFileSystem.innerRename()}}, various ArrayLists are created to track paths to
update; these are created with the default size. It could/should be possible to allocate better,
so avoid expensive array growth & copy operations while iterating through the list of
> # for a single file copy, sizes == 1
> # for a recursive copy, the outcome of the first real LIST will either provide the actual
size, or, if the list == the max response, a very large minimum size.
> For #2, we'd need to get the hint of iterable length rather than just iterate through...some
interface {{{IterableLength.expectedMinimumSize()}} could do that.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message