hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13371) S3A globber to use bulk listObject call over recursive directory scan
Date Mon, 12 Feb 2018 13:12:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16360718#comment-16360718
] 

ASF GitHub Bot commented on HADOOP-13371:
-----------------------------------------

Github user steveloughran commented on the issue:

    https://github.com/apache/hadoop/pull/204
  
    wontfix. S3guard is needed for consistency, and as it delivers the speedup we need at
the same time, making traumatic changes to the core code is hard to justify right now


> S3A globber to use bulk listObject call over recursive directory scan
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-13371
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13371
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs, fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>
> HADOOP-13208 produces O(1) listing of directory trees in {{FileSystem.listStatus}} calls,
but doesn't do anything for {{FileSystem.globStatus()}}, which uses a completely different
codepath, one which does a selective recursive scan by pattern matching as it goes down, filtering
out those patterns which don't match. Cost is O(matching-directories) + cost of examining
the files.
> It should be possible to do the glob status listing in S3A not through the filtered treewalk,
but through a list + filter operation. This would be an O(files) lookup *before any filtering
took place*.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message