hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (Jira)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-22819) Refactor Hive::listFilesCreatedByQuery to make it faster for object stores
Date Tue, 25 Feb 2020 14:14:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-22819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044471#comment-17044471
] 

Steve Loughran commented on HIVE-22819:
---------------------------------------

LGTM -this saves two round trips to HDFS, S3 or ABFS.

> Refactor Hive::listFilesCreatedByQuery to make it faster for object stores
> --------------------------------------------------------------------------
>
>                 Key: HIVE-22819
>                 URL: https://issues.apache.org/jira/browse/HIVE-22819
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Marton Bod
>            Assignee: Marton Bod
>            Priority: Major
>         Attachments: HIVE-22819.1.patch, HIVE-22819.2.patch, HIVE-22819.3.patch, HIVE-22819.4.patch
>
>
> {color:#0000ff}Hive::listFilesCreatedByQuery{color} does an exists(), an isDir() and
then a listing call. This can be expensive in object stores. We should instead directly list
the files in the directory (we'd have to handle an exception if the directory does not exists,
but issuing a single call to the object store would most likely still end up being more performant). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message