hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashutosh Chauhan <hashut...@apache.org>
Subject Re: Review Request 60753: Add HLL as an alternative to FM sketch to compute stats
Date Tue, 11 Jul 2017 23:20:12 GMT


> On July 10, 2017, 10:02 p.m., Prasanth_J wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
> > Line 1723 (original), 1723 (patched)
> > <https://reviews.apache.org/r/60753/diff/1/?file=1773281#file1773281line1723>
> >
> >     I am not sure if we need this config. 
> >     Any reason to support storing FM sketch's bitvectors?
> >     
> >     If this is for 3.0.0 release only, we could even remove this config.
> >     
> >     If we need to support FM sketch based NDV then having a separate field in metastore
will be better as Ashutosh suggested.
> 
> pengcheng xiong wrote:
>     This config is just to give user an option to choose whether to use FM or HLL to
COMPUTE ndv. This patch does not change the fact that we do not store bit vectors for both
FM and NDV in object store.
> 
> Prasanth_J wrote:
>     I don't think it will be useful for user to tune this.
>     IMHO, most users won't care a lot about this. This can be used as fallback but this
will only add more checks.

I agree with Prasanth. Overloading this config seems hacky. It is leading to confusing code,
e.g, NDVEstimator uses number of bit vectors to determine ndv algo, thats confusing. Its confusing
for end user too that +ve value from [0,100] is useful for one algo, but -ve value is only
to switch over to diff algo. I suggest that we leave this config as is and introduce a new
config which dictates which algo is used for ndv computation. This config value is than passed
to compute_stats() udaf by ColumnStatsSemanticAnalyzer. GenericUDAFComputeStats than uses
this config to determine which algo its using. 
This config we can store in metastore (either as is or mapped as int) so that we can deserialize
the bit vectors correctly when we retrieve them.


- Ashutosh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60753/#review180119
-----------------------------------------------------------


On July 10, 2017, 9:29 p.m., pengcheng xiong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60753/
> -----------------------------------------------------------
> 
> (Updated July 10, 2017, 9:29 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-16966
> 
> 
> Diffs
> -----
> 
>   common/pom.xml e6722babd8 
>   common/src/java/org/apache/hadoop/hive/common/HiveStatsUtils.java 7c9d72fbd2 
>   common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimator.java PRE-CREATION

>   common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java
PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLConstants.java PRE-CREATION

>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLDenseRegister.java PRE-CREATION

>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLRegister.java PRE-CREATION

>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java PRE-CREATION

>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java PRE-CREATION

>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLogUtils.java PRE-CREATION

>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 5700fb9325 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig da48a7ccbd 
>   metastore/src/java/org/apache/hadoop/hive/metastore/NumDistinctValueEstimator.java
92f9a845e3 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DecimalColumnStatsAggregator.java
36b2c9c56b 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DoubleColumnStatsAggregator.java
a88ef84e5c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/LongColumnStatsAggregator.java
8ac6561aec 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/StringColumnStatsAggregator.java
2aa4046a46 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMerger.java
33c7e3e52c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMergerFactory.java
fe890e4e27 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DateColumnStatsMerger.java
3179b23438 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DecimalColumnStatsMerger.java
c13add9d9c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DoubleColumnStatsMerger.java
fbdba24b0a 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/LongColumnStatsMerger.java
ac65590505 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/StringColumnStatsMerger.java
41587477d3 
>   pom.xml f9fae59a5d 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DecimalNumDistinctValueEstimator.java
a05906edfa 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DoubleNumDistinctValueEstimator.java
e76fc74dbc 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 2ebfcb2360

>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/LongNumDistinctValueEstimator.java
1c197a028a 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java fa70f49857

>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/StringNumDistinctValueEstimator.java
601901c163 
>   ql/src/test/queries/clientpositive/hll.q PRE-CREATION 
>   ql/src/test/results/clientpositive/hll.q.out PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/60753/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message