drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zelaine Fong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4126) Adding HiveMetaStore caching when impersonation is enabled.
Date Tue, 22 Dec 2015 16:48:46 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068386#comment-15068386

Zelaine Fong commented on DRILL-4126:

See DRILL-4217.  The changes there may also be needed to ensure that cache objects are kept
in the cache long enough to be beneficial.  The prior existing hard-coded invalidation period
of 1 minute may result in not seeing benefits from caching.

> Adding HiveMetaStore caching when impersonation is enabled. 
> ------------------------------------------------------------
>                 Key: DRILL-4126
>                 URL: https://issues.apache.org/jira/browse/DRILL-4126
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Jinfeng Ni
>            Assignee: Jinfeng Ni
>             Fix For: 1.5.0
> Currently, HiveMetastore caching is used only when impersonation is disabled, such that
all the hivemetastore call goes through NonCloseableHiveClientWithCaching [1]. However, if
impersonation is enabled, caching is not used for HiveMetastore access.
> This could significantly increase the planning time when hive storage plugin is enabled,
or when running a query against INFORMATION_SCHEMA. Depending on the # of databases/tables
in Hive storage plugin, the planning time or INFORMATION_SCHEMA query could become unacceptable.
This becomes even worse if the hive metastore is running on a different node from drillbit,
making the access of hivemetastore even slower.
> We are seeing that it could takes 30~60 seconds for planning time, or execution time
for INFORMATION_SCHEMA query.  The long planning or execution time for INFORMATION_SCHEMA
query prevents Drill from acting "interactively" for such queries. 
> We should enable caching when impersonation is used. As long as the authorizer verifies
the user has the access to databases/tables, we should get the data from caching. By doing
that, we should see reduced number of api call to HiveMetaStore.
> [1] https://github.com/apache/drill/blob/master/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/DrillHiveMetaStoreClient.java#L299

This message was sent by Atlassian JIRA

View raw message