hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rajesh Balamohan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-14953) don't use globStatus on S3 in MM tables
Date Fri, 21 Oct 2016 02:15:58 GMT

    [ https://issues.apache.org/jira/browse/HIVE-14953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15593700#comment-15593700
] 

Rajesh Balamohan commented on HIVE-14953:
-----------------------------------------

[~sershe] - It should be listFiles(path, recursive). I accidentally added as listStatus recursive
in my earlier comment.

Default FS: https://github.com/apache/hadoop/blob/branch-2.8/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L1814
S3A FS which optimizes for bulk listing: https://github.com/apache/hadoop/blob/branch-2.8/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2025


 

> don't use globStatus on S3 in MM tables
> ---------------------------------------
>
>                 Key: HIVE-14953
>                 URL: https://issues.apache.org/jira/browse/HIVE-14953
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Rajesh Balamohan
>            Assignee: Sergey Shelukhin
>             Fix For: hive-14535
>
>         Attachments: HIVE-14953.patch
>
>
> Need to investigate if recursive get is faster. Also, normal listStatus might suffice
because MM code handles directory structure in a more definite manner than old code; so it
knows where the files of interest are to be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message