hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mithun Radhakrishnan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-17956) Retrieve "latest" partition from Hive Metastore
Date Wed, 01 Nov 2017 17:49:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-17956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234468#comment-16234468
] 

Mithun Radhakrishnan commented on HIVE-17956:
---------------------------------------------

Hello, [~mkwhitacre].

HIVE-17466 might be of interest to you. This feature adds metastore-calls that return unique
values for specified partition keys. In the {{PartitionValuesRequest}}, one can specify the
required partition keys (e.g. {{dt}}), a filter (e.g. {{dt > "20170101" && dt <
"20171231"}}), a sort order (ascending/descending), and a limit (e.g. top 10).

The implementation sorts and filters on the metastore (in fact, it's pushed down to the database).
It doesn't require to form complete {{Partition}} objects. This is much more memory-friendly
and efficient than the alternative. It's in {{master}}, and {{branch-2}}. I have yet to port
this to {{branch-2.2}}.

HIVE-17467 wraps this raw API up in an {{HCatClient}} wrapper that's easier to use. This was
developed for use in Oozie (for discovery of data dependencies), and a couple of other projects.
The unit test in that patch indicates how to use it. This has yet to be reviewed or checked
in, I'm afraid.

Would this suit your requirement?



> Retrieve "latest" partition from Hive Metastore
> -----------------------------------------------
>
>                 Key: HIVE-17956
>                 URL: https://issues.apache.org/jira/browse/HIVE-17956
>             Project: Hive
>          Issue Type: New Feature
>          Components: Metastore
>            Reporter: Micah Whitacre
>
> We are trying to utilize the Hive Metastore for our processing needs, specifically focusing
on consuming through the HCatalog APIs.  One use case we have is that we want to consume the
"latest" partition.  In researching there are a number of posts[1][2] that talk about using
queries through Hive Server2 to find that information.  It would be more ideal if this was
a first class API offered from the Hive Metastore without requiring a query to be executed.
> The other option would be to retrieve all of the partitions and sort client side.  There
is a concern about the efficiency and memory requirements of this especially without the "iterator"
concept implemented from HIVE-7195.
> [1] - https://community.hortonworks.com/questions/85330/how-to-optimize-hive-access-to-the-latest-partitio.html
> [2] - https://stackoverflow.com/questions/36095790/how-to-find-the-most-recent-partition-in-hive-table



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message