hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chaoyu Tang" <ctang...@gmail.com>
Subject Re: Review Request 38429: HIVE-11786: Deprecate the use of redundant column in colunm stats related tables
Date Thu, 17 Sep 2015 17:36:46 GMT


- Chaoyu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38429/#review99307
-----------------------------------------------------------


On Sept. 17, 2015, 5:35 p.m., Chaoyu Tang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38429/
> -----------------------------------------------------------
> 
> (Updated Sept. 17, 2015, 5:35 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Sergey Shelukhin, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-11786
>     https://issues.apache.org/jira/browse/HIVE-11786
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> The stats tables such as TAB_COL_STATS, PART_COL_STATS have redundant columns such as
DB_NAME, TABLE_NAME, PARTITION_NAME since these tables already have foreign key like TBL_ID,
or PART_ID referencing to TBLS or PARTITIONS. But these columns are currently used in fetching
column stats (e.g. getTableStats/getPartitionStats) so any Hive operation involved in db/table/partition
name change has to update these columnn, which is not necessary and sometimes quite difficult
in implementation given the limitations from DN and RawStore APIs.
> This patch is to remove the use of these redundant columns at HMS code level. The changes
include:
> 1. Instead of directly using these columns in TAB_COL_STATS, PART_COL_STATS, use these
in their referenced tables.
> 2. currently the CBO code assumes that the column stats returned from HMS are in the
same order as that passed in column request. It is not gurantteed and has been changed.
> 3. The deprecated redundant columns are now temorarily populated with value "Deprecated".
They will be removed in a followed up JIRA.
> 
> 
> Diffs
> -----
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 1f89b7c

>   metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 4d6bfcc 
>   metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java b3ceff1

>   metastore/src/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java 328a65c

>   metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java
2967a60 
>   metastore/src/model/org/apache/hadoop/hive/metastore/model/MTableColumnStatistics.java
132f7a1 
>   metastore/src/test/org/apache/hadoop/hive/metastore/VerifyingObjectStore.java 7e46523

>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java 6c0bd25

> 
> Diff: https://reviews.apache.org/r/38429/diff/
> 
> 
> Testing
> -------
> 
> 1. Manually tested some cases against MySQL/PostgreSQL/Oracle.
> 2. Is running precommit test.
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message