drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4126) Adding HiveMetaStore caching when impersonation is enabled.
Date Tue, 01 Dec 2015 18:58:10 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034359#comment-15034359

ASF GitHub Bot commented on DRILL-4126:

GitHub user jinfengni opened a pull request:


    Drill 4127: Reduce Hive metastore client API call in HiveSchema

    Also, it has commit for DRILL-4126: Add cache to HiveSchema in order to reduce long planning
time or execution time caused by slow Hive meta store.
    Both DRILL-4127 and DRILL-4126 address the long delay caused by slow hive meta store.

    Passed unit, pre-commit regression, and additional impersonation test, before rebasing
onto latest master.
    Will re-run the above tests. 
    @vkorukanti , could you please review the two patches? Thanks.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jinfengni/incubator-drill DRILL-4127

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #286
commit 19a5a4d1c9c23eedcb94c988bd2229680575a118
Author: Jinfeng Ni <jni@apache.org>
Date:   2015-11-19T04:18:51Z

    DRILL-4127: Reduce Hive metastore client API call in HiveSchema.
    1) Use lazy loading of tableNames in HiveSchema, in stead of pre-loading all table names
under each HiveSchema.
    2) Do not call get_all_databases for subSchema to check existence if the name comes from
getSubSchemaNames() directly.

commit 9570319c227649144d3a14f8d5774fbe4a282bc4
Author: Jinfeng Ni <jni@apache.org>
Date:   2015-11-30T04:15:07Z

    DRILL-4126: Add cache to HiveSchema in order to reduce long planning time or execution
time caused by slow Hive meta store.
    1) HiveSchema caching will help in case impersonation is enabled.
    2) Use flat level cache for tables in DrillHiveMetaStoreClient.


> Adding HiveMetaStore caching when impersonation is enabled. 
> ------------------------------------------------------------
>                 Key: DRILL-4126
>                 URL: https://issues.apache.org/jira/browse/DRILL-4126
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Jinfeng Ni
>            Assignee: Jinfeng Ni
> Currently, HiveMetastore caching is used only when impersonation is disabled, such that
all the hivemetastore call goes through NonCloseableHiveClientWithCaching [1]. However, if
impersonation is enabled, caching is not used for HiveMetastore access.
> This could significantly increase the planning time when hive storage plugin is enabled,
or when running a query against INFORMATION_SCHEMA. Depending on the # of databases/tables
in Hive storage plugin, the planning time or INFORMATION_SCHEMA query could become unacceptable.
This becomes even worse if the hive metastore is running on a different node from drillbit,
making the access of hivemetastore even slower.
> We are seeing that it could takes 30~60 seconds for planning time, or execution time
for INFORMATION_SCHEMA query.  The long planning or execution time for INFORMATION_SCHEMA
query prevents Drill from acting "interactively" for such queries. 
> We should enable caching when impersonation is used. As long as the authorizer verifies
the user has the access to databases/tables, we should get the data from caching. By doing
that, we should see reduced number of api call to HiveMetaStore.
> [1] https://github.com/apache/drill/blob/master/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/DrillHiveMetaStoreClient.java#L299

This message was sent by Atlassian JIRA

View raw message