hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chaoyu Tang" <>
Subject Review Request 38429: HIVE-11786: Deprecate the use of redundant column in colunm stats related tables
Date Wed, 16 Sep 2015 12:37:29 GMT

This is an automatically generated e-mail. To reply, visit:

Review request for hive, Ashutosh Chauhan, Sergey Shelukhin, and Xuefu Zhang.

Bugs: HIVE-11786

Repository: hive-git


The stats tables such as TAB_COL_STATS, PART_COL_STATS have redundant columns such as DB_NAME,
TABLE_NAME, PARTITION_NAME since these tables already have foreign key like TBL_ID, or PART_ID
referencing to TBLS or PARTITIONS. But these columns are currently used in fetching column
stats (e.g. getTableStats/getPartitionStats) so any Hive operation involved in db/table/partition
name change has to update these columnn, which is not necessary and sometimes quite difficult
in implementation given the limitations from DN and RawStore APIs.
This patch is to remove the use of these redundant columns at HMS code level. The changes
1. Instead of directly using these columns in TAB_COL_STATS, PART_COL_STATS, use these in
their referenced tables.
2. currently the CBO code assumes that the column stats returned from HMS are in the same
order as that passed in column request. It is not gurantteed and has been changed.
3. The deprecated redundant columns are now temorarily populated with value "Deprecated".
They will be removed in a followed up JIRA.


  metastore/src/java/org/apache/hadoop/hive/metastore/ 1f89b7c 
  metastore/src/java/org/apache/hadoop/hive/metastore/ 4d6bfcc 
  metastore/src/java/org/apache/hadoop/hive/metastore/ b3ceff1 
  metastore/src/test/org/apache/hadoop/hive/metastore/ 7e46523 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/ 6c0bd25 



1. Manually tested some cases against MySQL/PostgreSQL/Oracle.
2. Is running precommit test.


Chaoyu Tang

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message