drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4826) Query against INFORMATION_SCHEMA.TABLES degrades as the number of views increases
Date Tue, 20 Sep 2016 17:34:20 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507218#comment-15507218
] 

ASF GitHub Bot commented on DRILL-4826:
---------------------------------------

Github user sudheeshkatkam commented on a diff in the pull request:

    https://github.com/apache/drill/pull/592#discussion_r79664962
  
    --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java
---
    @@ -78,17 +79,34 @@ public String getTypeName() {
       }
     
       @Override
    -  public List<Pair<String, ? extends Table>> getTablesByNamesByBulkLoad(final
List<String> tableNames) {
    +  public List<Pair<String, ? extends Table>> getTablesByNamesByBulkLoad(final
List<String> tableNames, final int bulkSize) {
    +    final int totalTables = tableNames.size();
         final String schemaName = getName();
    -    final List<Pair<String, ? extends Table>> tableNameToTable = Lists.newArrayList();
    -    List<org.apache.hadoop.hive.metastore.api.Table> tables;
    -    try {
    -      tables = DrillHiveMetaStoreClient.getTableObjectsByNameHelper(mClient, schemaName,
tableNames);
    -    } catch (TException e) {
    -      logger.warn("Exception occurred while trying to list tables by names from {}: {}",
schemaName, e.getCause());
    -      return tableNameToTable;
    +    final List<org.apache.hadoop.hive.metastore.api.Table> tables = Lists.newArrayList();
    +
    +    // In each round, Drill asks for a sub-list of all the requested tables
    +    for(int fromIndex = 0; fromIndex < totalTables; fromIndex += bulkSize) {
    +      final int toIndex = Math.min(fromIndex + bulkSize, totalTables);
    +      final List<String> eachBulkofTableNames = tableNames.subList(fromIndex, toIndex);
    +      List<org.apache.hadoop.hive.metastore.api.Table> eachBulkofTables;
    +      // Retries once if the first call to fetch the metadata fails
    +      synchronized(mClient) {
    +        try {
    +          eachBulkofTables = mClient.getTableObjectsByName(schemaName, eachBulkofTableNames);
    --- End diff --
    
    + Why not use the helper? Exception handling and reconnecting logic is different in the
helper methods in [DrillHiveMetaStoreClient](https://github.com/apache/drill/blob/master/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/DrillHiveMetaStoreClient.java#L222).

    + Move this logic to a method in that class?


> Query against INFORMATION_SCHEMA.TABLES degrades as the number of views increases
> ---------------------------------------------------------------------------------
>
>                 Key: DRILL-4826
>                 URL: https://issues.apache.org/jira/browse/DRILL-4826
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Parth Chandra
>            Assignee: Parth Chandra
>
> Queries against INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.VIEWS slow down as the
number of views increases. 
> BI tools like Tableau issue a query like the following at connection time:
> {code}
> select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from INFORMATION_SCHEMA.`TABLES`
WHERE TABLE_CATALOG LIKE 'DRILL' ESCAPE '\' AND TABLE_SCHEMA <> 'sys' AND TABLE_SCHEMA
<> 'INFORMATION_SCHEMA'ORDER BY TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME
> {code}
> The time to query the information schema tables degrades as the number of views increases.
On a test system:
> || Views || Time(secs) ||
> |500 | 6 |
> |1000 | 19 |
> |1500 | 33 |
> This can result in a single connection taking more than a minute to establish.
> The problem occurs because we read the view file for every view and this appears to take
most of the time.
> Querying information_schema.tables does not, in fact, need to open the view file at all,
it merely needs to get a listing of the view files. Eliminating the view file read will speed
up the query tremendously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message