drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Padma Penumarthy (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5972) Slow performance for query on INFORMATION_SCHEMA.TABLE
Date Thu, 16 Nov 2017 01:34:00 GMT
Padma Penumarthy created DRILL-5972:
---------------------------------------

             Summary: Slow performance for query on INFORMATION_SCHEMA.TABLE
                 Key: DRILL-5972
                 URL: https://issues.apache.org/jira/browse/DRILL-5972
             Project: Apache Drill
          Issue Type: Bug
          Components: Storage - Information Schema
    Affects Versions: 1.11.0
            Reporter: Padma Penumarthy
            Assignee: Padma Penumarthy
             Fix For: 1.13.0


A query like the following on INFORMATION_SCHEMA takes a long time to execute. 

select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from INFORMATION_SCHEMA.`TABLES`
WHERE TABLE_NAME LIKE '%' AND ( TABLE_SCHEMA = 'hive.default' ) ORDER BY TABLE_TYPE, TABLE_CATALOG,
TABLE_SCHEMA, TABLE_NAME; 

Reason being we fetch table information for all schemas instead of just 'hive.default' schema.

If we  change the predicate like this, it executes very fast.

select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from INFORMATION_SCHEMA.`TABLES`
WHERE  ( TABLE_SCHEMA = 'hive.default' ) AND TABLE_NAME LIKE '%'  ORDER BY TABLE_TYPE, TABLE_CATALOG,
TABLE_SCHEMA, TABLE_NAME; 

The difference is in the order in which we evaluate the expressions in the predicate.
In the first case,  we first evaluate TABLE_NAME LIKE '%' and decide that it is inconclusive
(since we do not know the schema). So, we go get all tables for all the schemas.

In the second case, we first evaluate  TABLE_SCHEMA = 'hive.default' and decide that we need
to fetch only tables for that schema.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message