hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rajesh Balamohan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-14953) don't use globStatus on S3 in MM tables
Date Fri, 21 Oct 2016 01:31:58 GMT

    [ https://issues.apache.org/jira/browse/HIVE-14953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15593627#comment-15593627
] 

Rajesh Balamohan commented on HIVE-14953:
-----------------------------------------

[~sershe] - It was in FileSinkOperator.handleMMTable (getMmDirectoryCandidates) specifically.
I do not see that codepath in the latest codebase in the branch now. globStatus with pattern
has to be replaced with {{listStatus(path, boolean recursive)}} and any additional filtering
pattern has to be applied on client side. In cloud storage systems, it would be able to do
prefix listing and reduce the number of calls significantly as compared to globStatus which
iterates through the files one at a time in client side.

> don't use globStatus on S3 in MM tables
> ---------------------------------------
>
>                 Key: HIVE-14953
>                 URL: https://issues.apache.org/jira/browse/HIVE-14953
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Rajesh Balamohan
>            Assignee: Sergey Shelukhin
>             Fix For: hive-14535
>
>         Attachments: HIVE-14953.patch
>
>
> Need to investigate if recursive get is faster. Also, normal listStatus might suffice
because MM code handles directory structure in a more definite manner than old code; so it
knows where the files of interest are to be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message