hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <>
Subject [jira] [Commented] (HIVE-5189) make batching in partition retrieval in metastore applicable to more methods
Date Wed, 11 Sep 2013 02:15:51 GMT


Sergey Shelukhin commented on HIVE-5189:

it seems like there are two ways to make this work uniformly for all flavors of getPartitions[WithFoo][ByBar],
of which there are several (by filter, by names, by regex, with auth, all partitions, etc.)
First, for each filtering call except by names add a call the will return the filtered names
rather than partitions; then, for each return flavor add getPartitionsByNamesWithFoo; the
new client will get names and then control the batching. This has advantage of less API breakage,
but disadvantage of doing 2 calls in most cases where only one is necessary (typically small
number of partitions is retrieved and OOM is not a problem).
Second, add APIs _with_batching that would take max count (some APIs already do), as well
as last retrieved partition name, which can then be added as an additional condition (partName
> lastName) to all JDO and SQL queries used for retrieval. The client would send that on
subsequent calls. This has disadvantage of requiring slightly more new APIs, and backward
compat code in client. Old APIs will be deprecated, and removed 1-2 versions later. New APIs
can use request-response structs as parameter and return, which will allow adding args/etc.
in future without breaking backward compat.
> make batching in partition retrieval in metastore applicable to more methods
> ----------------------------------------------------------------------------
>                 Key: HIVE-5189
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Sergey Shelukhin
> As indicated in HIVE-5158, Metastore can OOM if retrieving a large number of partitions.
For client-side partition filtering, the client applies batching (that would avoid that) by
sending parts of the filtered name list in separate request according to configuration.
> The batching is not used on filter pushdown path, and when retrieving all partitions
(e.g. when the pruner expression is not useful in non-strict mode). HIVE-4914 and pushdown
improvements will make this problem somewhat worse by allowing more requests to go to the
> There needs to be some batching scheme (ideally, a somewhat generic one) that would be
applicable to all these paths.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message