drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4577) Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in
Date Thu, 07 Apr 2016 00:45:25 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229433#comment-15229433
] 

ASF GitHub Bot commented on DRILL-4577:
---------------------------------------

Github user hsuanyi commented on a diff in the pull request:

    https://github.com/apache/drill/pull/461#discussion_r58805242
  
    --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java
---
    @@ -72,4 +80,76 @@ public String getTypeName() {
         return HiveStoragePluginConfig.NAME;
       }
     
    +  @Override
    +  public List<Pair<String, ? extends Table>> getTablesByNames(final List<String>
tableNames) {
    +    final String schemaName = getName();
    +    final List<Pair<String, ? extends Table>> tableNameToTable = Lists.newArrayList();
    +    List<org.apache.hadoop.hive.metastore.api.Table> tables;
    +    // Retries once if the first call to fetch the metadata fails
    +    synchronized(mClient) {
    +      final List<String> tableNamesWithAuth = Lists.newArrayList();
    +      for(String tableName : tableNames) {
    +        try {
    +          if(mClient.tableExists(schemaName, tableName)) {
    --- End diff --
    
    I did some tests here. When there are many tables, the improvement by optimizing for the
second objective is not significant enough. However, the objective of this issue would make
sense only when there are many tables. I think I still need to figure out a solution.


> Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in
> ---------------------------------------------------------------------------
>
>                 Key: DRILL-4577
>                 URL: https://issues.apache.org/jira/browse/DRILL-4577
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Hive
>            Reporter: Sean Hsuan-Yi Chu
>            Assignee: Sean Hsuan-Yi Chu
>             Fix For: 1.7.0
>
>
> A query such as 
> {code}
> select * from INFORMATION_SCHEMA.`TABLES` 
> {code}
> is converted as calls to fetch all tables from storage plugins. 
> When users have Hive, the calls to hive metadata storage would be: 
> 1) get_table
> 2) get_partitions
> However, the information regarding partitions is not used in this type of queries. Beside,
a more efficient way is to fetch tables is to use get_multi_table call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message