drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jinfeng Ni (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-4126) Adding HiveMetaStore caching when impersonation is enabled.
Date Tue, 24 Nov 2015 20:34:10 GMT
Jinfeng Ni created DRILL-4126:

             Summary: Adding HiveMetaStore caching when impersonation is enabled. 
                 Key: DRILL-4126
                 URL: https://issues.apache.org/jira/browse/DRILL-4126
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Jinfeng Ni
            Assignee: Jinfeng Ni

Currently, HiveMetastore caching is used only when impersonation is disabled, such that all
the hivemetastore call goes through NonCloseableHiveClientWithCaching [1]. However, if impersonation
is enabled, caching is not used for HiveMetastore access.

This could significantly increase the planning time when hive storage plugin is enabled, or
when running a query against INFORMATION_SCHEMA. Depending on the # of databases/tables in
Hive storage plugin, the planning time or INFORMATION_SCHEMA query could become unacceptable.
This becomes even worse if the hive metastore is running on a different node from drillbit,
making the access of hivemetastore even slower.

We are seeing that it could takes 30~60 seconds for planning time, or execution time for INFORMATION_SCHEMA
query.  The long planning or execution time for INFORMATION_SCHEMA query prevents Drill from
acting "interactively" for such queries. 

We should enable caching when impersonation is used. As long as the authorizer verifies the
user has the access to databases/tables, we should get the data from caching. By doing that,
we should see reduced number of api call to HiveMetaStore.

[1] https://github.com/apache/drill/blob/master/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/DrillHiveMetaStoreClient.java#L299

This message was sent by Atlassian JIRA

View raw message