drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Parth Chandra (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-4826) Query against INFORMATION_SCHEMA.TABLES degrades as the number of views increases
Date Thu, 04 Aug 2016 21:02:20 GMT
Parth Chandra created DRILL-4826:
------------------------------------

             Summary: Query against INFORMATION_SCHEMA.TABLES degrades as the number of views
increases
                 Key: DRILL-4826
                 URL: https://issues.apache.org/jira/browse/DRILL-4826
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Parth Chandra
            Assignee: Parth Chandra


Queries against INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.VIEWS slow down as the number
of views increases. 

BI tools like Tableau issue a query like the following at connection time:
{code}
select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from INFORMATION_SCHEMA.`TABLES`
WHERE TABLE_CATALOG LIKE 'DRILL' ESCAPE '\' AND TABLE_SCHEMA <> 'sys' AND TABLE_SCHEMA
<> 'INFORMATION_SCHEMA'ORDER BY TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME
{code}

The time to query the information schema tables degrades as the number of views increases.
On a test system:

|| Views || Time(secs) ||
|500 | 6 |
|1000 | 19 |
|1500 | 33 |

This can result in a single connection taking more than a minute to establish.

The problem occurs because we read the view file for every view and this appears to take most
of the time.

Querying information_schema.tables does not, in fact, need to open the view file at all, it
merely needs to get a listing of the view files. Eliminating the view file read will speed
up the query tremendously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message