drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jinfeng Ni (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4127) HiveSchema.getSubSchema() should use lazy loading of all the table names
Date Wed, 02 Dec 2015 21:30:11 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036677#comment-15036677
] 

Jinfeng Ni commented on DRILL-4127:
-----------------------------------

For a hive storage plugin with about 8 schema/databases, if I run a simple query like this:

select count(*) from hive.table1;

>From hive.log, we saw that the # of hive metastore api call as following:

Without the patch. Impersonation is turned on.
1. # of get_all_databases API call: 31
2. # of get_all_tables API call: 30
3. # of get_table API call: 2

That explains that why some Drill users report that they saw Drill spent 20-30 seconds on
planning for such simple query,  making the query not "interactive" at all.

 


> HiveSchema.getSubSchema() should use lazy loading of all the table names
> ------------------------------------------------------------------------
>
>                 Key: DRILL-4127
>                 URL: https://issues.apache.org/jira/browse/DRILL-4127
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Jinfeng Ni
>            Assignee: Jinfeng Ni
>
> Currently, HiveSchema.getSubSchema() will pre-load all the table names when it constructs
the subschema, even though those tables names are not requested at all. This could cause considerably
big performance overhead, especially when the hive schema contains large # of objects (thousands
of tables/views are not un-common in some use case). 
> In stead, we should change the loading of table names to on-demand. Only when there is
a request of get all table names, we load them into hive schema.
> This should help "show schemas", since it only requires the schema name, not the table
names in the schema. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message