hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "pengcheng xiong" <>
Subject Re: Review Request 31178: Discrepancy in cardinality estimates between partitioned and un-partitioned tables
Date Mon, 06 Apr 2015 21:21:58 GMT

This is an automatically generated e-mail. To reply, visit:

(Updated April 6, 2015, 9:21 p.m.)

Review request for hive and Ashutosh Chauhan.

Repository: hive-git


The discrepancy is because NDV calculation for a partitioned table assumes that the NDV range
is contained within each partition and is calculates as "select max(NUM_DISTINCTS) from PART_COL_STATS”
This is problematic for columns like ticket number which are naturally increasing with the
partitioned date column ss_sold_date_sk.

Diffs (updated)

  common/src/java/org/apache/hadoop/hive/conf/ cc16c38 
  data/files/extrapolate_stats_partial_ndv.txt PRE-CREATION 
  metastore/src/java/org/apache/hadoop/hive/metastore/ 74f1b01

  metastore/src/java/org/apache/hadoop/hive/metastore/ 7fc04f1

  metastore/src/java/org/apache/hadoop/hive/metastore/ d404789 
  metastore/src/java/org/apache/hadoop/hive/metastore/ 6956e3b 
  metastore/src/java/org/apache/hadoop/hive/metastore/ 475883b 
  ql/src/test/queries/clientpositive/extrapolate_part_stats_partial_ndv.q PRE-CREATION 
  ql/src/test/results/clientpositive/extrapolate_part_stats_partial_ndv.q.out PRE-CREATION




pengcheng xiong

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message