Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 320A0200B72 for ; Fri, 22 Jul 2016 02:21:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 30FD6160A7C; Fri, 22 Jul 2016 00:21:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 74B6D160A73 for ; Fri, 22 Jul 2016 02:21:21 +0200 (CEST) Received: (qmail 64312 invoked by uid 500); 22 Jul 2016 00:21:20 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 64293 invoked by uid 99); 22 Jul 2016 00:21:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Jul 2016 00:21:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 84B6F2C0D56 for ; Fri, 22 Jul 2016 00:21:20 +0000 (UTC) Date: Fri, 22 Jul 2016 00:21:20 +0000 (UTC) From: "Dechang Gu (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Closed] (DRILL-4127) HiveSchema.getSubSchema() should use lazy loading of all the table names MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 22 Jul 2016 00:21:22 -0000 [ https://issues.apache.org/jira/browse/DRILL-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dechang Gu closed DRILL-4127. ----------------------------- verified with perf test framework. without the patch (commit id: 539cbba): 91_539cbba_HIVE_20160720_113024/HIVE_limit1_02/HIVE_limit1_02.log:[STAT] TOTAL TIME : 126599 msec 91_539cbba_HIVE_20160720_113024/HIVE_limit1_02/HIVE_limit1_02.log:[STAT] TOTAL TIME : 165969 msec 91_539cbba_HIVE_20160720_113024/HIVE_limit1_02/HIVE_limit1_02.log:[STAT] TOTAL TIME : 163977 msec with the patch (Apache Drill 1.5.0 GA, commit id: 3f228d3), the same query: 95_3f228d3_HIVE_20160721_130712/HIVE_limit1_02/HIVE_limit1_02.log:[STAT] TOTAL TIME : 1664 msec 95_3f228d3_HIVE_20160721_130712/HIVE_limit1_02/HIVE_limit1_02.log:[STAT] TOTAL TIME : 157 msec 95_3f228d3_HIVE_20160721_130712/HIVE_limit1_02/HIVE_limit1_02.log:[STAT] TOTAL TIME : 167 msec So, LGTM. > HiveSchema.getSubSchema() should use lazy loading of all the table names > ------------------------------------------------------------------------ > > Key: DRILL-4127 > URL: https://issues.apache.org/jira/browse/DRILL-4127 > Project: Apache Drill > Issue Type: Bug > Reporter: Jinfeng Ni > Assignee: Jinfeng Ni > Fix For: 1.5.0 > > > Currently, HiveSchema.getSubSchema() will pre-load all the table names when it constructs the subschema, even though those tables names are not requested at all. This could cause considerably big performance overhead, especially when the hive schema contains large # of objects (thousands of tables/views are not un-common in some use case). > In stead, we should change the loading of table names to on-demand. Only when there is a request of get all table names, we load them into hive schema. > This should help "show schemas", since it only requires the schema name, not the table names in the schema. -- This message was sent by Atlassian JIRA (v6.3.4#6332)