Return-Path: X-Original-To: apmail-drill-dev-archive@www.apache.org Delivered-To: apmail-drill-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6272118231 for ; Tue, 24 Nov 2015 20:34:11 +0000 (UTC) Received: (qmail 98793 invoked by uid 500); 24 Nov 2015 20:34:11 -0000 Delivered-To: apmail-drill-dev-archive@drill.apache.org Received: (qmail 98740 invoked by uid 500); 24 Nov 2015 20:34:11 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 98687 invoked by uid 99); 24 Nov 2015 20:34:11 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Nov 2015 20:34:11 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id ECC682C1F56 for ; Tue, 24 Nov 2015 20:34:10 +0000 (UTC) Date: Tue, 24 Nov 2015 20:34:10 +0000 (UTC) From: "Jinfeng Ni (JIRA)" To: dev@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (DRILL-4126) Adding HiveMetaStore caching when impersonation is enabled. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Jinfeng Ni created DRILL-4126: --------------------------------- Summary: Adding HiveMetaStore caching when impersonation is enabled. Key: DRILL-4126 URL: https://issues.apache.org/jira/browse/DRILL-4126 Project: Apache Drill Issue Type: Bug Reporter: Jinfeng Ni Assignee: Jinfeng Ni Currently, HiveMetastore caching is used only when impersonation is disabled, such that all the hivemetastore call goes through NonCloseableHiveClientWithCaching [1]. However, if impersonation is enabled, caching is not used for HiveMetastore access. This could significantly increase the planning time when hive storage plugin is enabled, or when running a query against INFORMATION_SCHEMA. Depending on the # of databases/tables in Hive storage plugin, the planning time or INFORMATION_SCHEMA query could become unacceptable. This becomes even worse if the hive metastore is running on a different node from drillbit, making the access of hivemetastore even slower. We are seeing that it could takes 30~60 seconds for planning time, or execution time for INFORMATION_SCHEMA query. The long planning or execution time for INFORMATION_SCHEMA query prevents Drill from acting "interactively" for such queries. We should enable caching when impersonation is used. As long as the authorizer verifies the user has the access to databases/tables, we should get the data from caching. By doing that, we should see reduced number of api call to HiveMetaStore. [1] https://github.com/apache/drill/blob/master/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/DrillHiveMetaStoreClient.java#L299 -- This message was sent by Atlassian JIRA (v6.3.4#6332)