drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dechang Gu (JIRA)" <j...@apache.org>
Subject [jira] [Closed] (DRILL-4127) HiveSchema.getSubSchema() should use lazy loading of all the table names
Date Fri, 22 Jul 2016 00:21:20 GMT

     [ https://issues.apache.org/jira/browse/DRILL-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dechang Gu closed DRILL-4127.
-----------------------------

verified with perf test framework.
without the patch (commit id: 539cbba):

91_539cbba_HIVE_20160720_113024/HIVE_limit1_02/HIVE_limit1_02.log:[STAT] TOTAL TIME : 126599
msec
91_539cbba_HIVE_20160720_113024/HIVE_limit1_02/HIVE_limit1_02.log:[STAT] TOTAL TIME : 165969
msec
91_539cbba_HIVE_20160720_113024/HIVE_limit1_02/HIVE_limit1_02.log:[STAT] TOTAL TIME : 163977
msec


with the patch (Apache Drill 1.5.0 GA, commit id: 3f228d3), the same query:
95_3f228d3_HIVE_20160721_130712/HIVE_limit1_02/HIVE_limit1_02.log:[STAT] TOTAL TIME : 1664
msec
95_3f228d3_HIVE_20160721_130712/HIVE_limit1_02/HIVE_limit1_02.log:[STAT] TOTAL TIME : 157
msec
95_3f228d3_HIVE_20160721_130712/HIVE_limit1_02/HIVE_limit1_02.log:[STAT] TOTAL TIME : 167
msec


So, LGTM.

> HiveSchema.getSubSchema() should use lazy loading of all the table names
> ------------------------------------------------------------------------
>
>                 Key: DRILL-4127
>                 URL: https://issues.apache.org/jira/browse/DRILL-4127
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Jinfeng Ni
>            Assignee: Jinfeng Ni
>             Fix For: 1.5.0
>
>
> Currently, HiveSchema.getSubSchema() will pre-load all the table names when it constructs
the subschema, even though those tables names are not requested at all. This could cause considerably
big performance overhead, especially when the hive schema contains large # of objects (thousands
of tables/views are not un-common in some use case). 
> In stead, we should change the loading of table names to on-demand. Only when there is
a request of get all table names, we load them into hive schema.
> This should help "show schemas", since it only requires the schema name, not the table
names in the schema. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message