hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-19605) TAB_COL_STATS table has no index on db/table name
Date Tue, 22 May 2018 21:37:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-19605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16484623#comment-16484623
] 

Todd Lipcon commented on HIVE-19605:
------------------------------------

It seems like this table can also be called from a get_table call. Oddly, the query being
generated is:

SELECT 'org.apache.hadoop.hive.metastore.model.MTableColumnStatistics' AS NUCLEUS_TYPE,`A0`.`AVG_COL_LEN`,`A0`.`COLUMN_NAME`,`A0`.`COLUMN_TYPE`,`A0`.`DB_NAME`,`A0`.`BIG_DECIMAL_HIGH_VALUE`,`A0`.`BIG_DECIMAL_LOW_VALUE`,`A0`.`DOUBLE_HIGH_VALUE`,`A0`.`DOUBLE_LOW_VALUE`,`A0`.`LAST_ANALYZED`,`A0`.`LONG_HIGH_VALUE`,`A0`.`LONG_LOW_VALUE`,`A0`.`MAX_COL_LEN`,`A0`.`NUM_DISTINCTS`,`A0`.`NUM_FALSES`,`A0`.`NUM_NULLS`,`A0`.`NUM_TRUES`,`A0`.`TABLE_NAME`,`A0`.`CS_ID`
FROM `TAB_COL_STATS` `A0` WHERE `A0`.`DB_NAME` = '';

(note the empty db_name).

Given the lack of index, this takes 450ms on the HMS instance I am testing (if the mysql query
cache is disabled)

> TAB_COL_STATS table has no index on db/table name
> -------------------------------------------------
>
>                 Key: HIVE-19605
>                 URL: https://issues.apache.org/jira/browse/HIVE-19605
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>            Reporter: Todd Lipcon
>            Priority: Major
>
> The TAB_COL_STATS table is missing an index on (CAT_NAME, DB_NAME, TABLE_NAME). The getTableColumnStatistics
call queries based on this tuple. This makes those queries take a significant amount of time
in large metastores since they do a full table scan.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message