hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rajesh Balamohan (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-14953) don't use globStatus on S3 in MM tables
Date Fri, 21 Oct 2016 02:17:58 GMT

    [ https://issues.apache.org/jira/browse/HIVE-14953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15593700#comment-15593700
] 

Rajesh Balamohan edited comment on HIVE-14953 at 10/21/16 2:16 AM:
-------------------------------------------------------------------

[~sershe] - It should be listFiles(path, recursive). I accidentally added as listStatus recursive
in my earlier comment.

Default FS: https://github.com/apache/hadoop/blob/branch-2.8/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L1814
S3A FS which optimizes for bulk listing: https://github.com/apache/hadoop/blob/branch-2.8/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2025

So instead of 1000s of calls to s3 with globStatus, it would end up using very few calls to
S3 with listFiles(path, recursive) and client side path filtering can be done on need basis.


 


was (Author: rajesh.balamohan):
[~sershe] - It should be listFiles(path, recursive). I accidentally added as listStatus recursive
in my earlier comment.

Default FS: https://github.com/apache/hadoop/blob/branch-2.8/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L1814
S3A FS which optimizes for bulk listing: https://github.com/apache/hadoop/blob/branch-2.8/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2025


 

> don't use globStatus on S3 in MM tables
> ---------------------------------------
>
>                 Key: HIVE-14953
>                 URL: https://issues.apache.org/jira/browse/HIVE-14953
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Rajesh Balamohan
>            Assignee: Sergey Shelukhin
>             Fix For: hive-14535
>
>         Attachments: HIVE-14953.patch
>
>
> Need to investigate if recursive get is faster. Also, normal listStatus might suffice
because MM code handles directory structure in a more definite manner than old code; so it
knows where the files of interest are to be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message