hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From pengcheng xiong <pxi...@hortonworks.com>
Subject Re: Review Request 60753: Add HLL as an alternative to FM sketch to compute stats
Date Fri, 14 Jul 2017 19:31:45 GMT


> On July 14, 2017, 12:55 a.m., Ashutosh Chauhan wrote:
> > common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java
> > Lines 30-34 (patched)
> > <https://reviews.apache.org/r/60753/diff/2/?file=1775584#file1775584line30>
> >
> >     As discussed this determination should happen using first byte of serialized
string.

That is true. But I would prefer to do it in the next following up jira as this will totally
change the serde of hll as well as fm. And, as I analyzed, I may need to do bitpacking for
fm sketch in order to fit 64k limit for varchar in the worst case.


- pengcheng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60753/#review180494
-----------------------------------------------------------


On July 14, 2017, 12:08 a.m., pengcheng xiong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60753/
> -----------------------------------------------------------
> 
> (Updated July 14, 2017, 12:08 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-16966
> 
> 
> Diffs
> -----
> 
>   common/pom.xml 023f084511 
>   common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java
PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLConstants.java PRE-CREATION

>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLDenseRegister.java PRE-CREATION

>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLRegister.java PRE-CREATION

>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java PRE-CREATION

>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java PRE-CREATION

>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLogUtils.java PRE-CREATION

>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java c7afe2bc4a 
>   metastore/src/java/org/apache/hadoop/hive/metastore/NumDistinctValueEstimator.java
92f9a845e3 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/StatsCache.java 18f8afc9ad

>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/ColumnStatsAggregator.java
31955b4363 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/ColumnStatsAggregatorFactory.java
daf85692eb 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DecimalColumnStatsAggregator.java
36b2c9c56b 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DoubleColumnStatsAggregator.java
a88ef84e5c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/LongColumnStatsAggregator.java
8ac6561aec 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/StringColumnStatsAggregator.java
2aa4046a46 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMerger.java
33c7e3e52c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMergerFactory.java
fe890e4e27 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DateColumnStatsMerger.java
3179b23438 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DecimalColumnStatsMerger.java
c13add9d9c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DoubleColumnStatsMerger.java
fbdba24b0a 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/LongColumnStatsMerger.java
ac65590505 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/StringColumnStatsMerger.java
41587477d3 
>   metastore/src/test/org/apache/hadoop/hive/metastore/hbase/TestHBaseAggregateStatsNDVUniformDist.java
87b1ac870d 
>   pom.xml 32f5fd1493 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 0a5cf00c44

>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java 76f7daeb1b 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DecimalNumDistinctValueEstimator.java
a05906edfa 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DoubleNumDistinctValueEstimator.java
e76fc74dbc 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 2ebfcb2360

>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/LongNumDistinctValueEstimator.java
1c197a028a 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java fa70f49857

>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/StringNumDistinctValueEstimator.java
601901c163 
>   ql/src/test/queries/clientpositive/char_udf1.q 39aa0e0e17 
>   ql/src/test/queries/clientpositive/compute_stats_date.q 09128f6fb9 
>   ql/src/test/queries/clientpositive/compute_stats_decimal.q 76e1468ada 
>   ql/src/test/queries/clientpositive/compute_stats_double.q 7a1e0f6295 
>   ql/src/test/queries/clientpositive/compute_stats_long.q 6a2070f780 
>   ql/src/test/queries/clientpositive/compute_stats_string.q 0023e7f6bd 
>   ql/src/test/queries/clientpositive/hll.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/reduceSinkDeDuplication_pRS_key_empty.q 8bbae3914d

>   ql/src/test/queries/clientpositive/varchar_udf1.q 4d1f884ea7 
>   ql/src/test/queries/clientpositive/vector_udf1.q 48d3e1ee4d 
>   ql/src/test/results/clientpositive/alter_partition_update_status.q.out 922822e6d2 
>   ql/src/test/results/clientpositive/alter_table_column_stats.q.out 2cc7cbc7b6 
>   ql/src/test/results/clientpositive/alter_table_update_status.q.out e26e8cba1c 
>   ql/src/test/results/clientpositive/analyze_tbl_part.q.out ed90b6fc92 
>   ql/src/test/results/clientpositive/annotate_stats_deep_filters.q.out 95dd6abaec 
>   ql/src/test/results/clientpositive/annotate_stats_groupby.q.out a8e4854a00 
>   ql/src/test/results/clientpositive/annotate_stats_join.q.out c1a140b558 
>   ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out f7d73c9ddf 
>   ql/src/test/results/clientpositive/autoColumnStats_4.q.out fe3b9e53ef 
>   ql/src/test/results/clientpositive/autoColumnStats_5.q.out e19fb5f504 
>   ql/src/test/results/clientpositive/autoColumnStats_6.q.out 29b3373e10 
>   ql/src/test/results/clientpositive/autoColumnStats_7.q.out 9d24bc53ab 
>   ql/src/test/results/clientpositive/autoColumnStats_8.q.out 681d962ed0 
>   ql/src/test/results/clientpositive/autoColumnStats_9.q.out d26e2c02b7 
>   ql/src/test/results/clientpositive/auto_join_without_localtask.q.out 17a912ec13 
>   ql/src/test/results/clientpositive/avro_decimal.q.out 5a3b72defe 
>   ql/src/test/results/clientpositive/avro_decimal_native.q.out fe77512191 
>   ql/src/test/results/clientpositive/cbo_rp_annotate_stats_groupby.q.out f260f034b6 
>   ql/src/test/results/clientpositive/cbo_rp_join0.q.out b9cf3ceab4 
>   ql/src/test/results/clientpositive/char_udf1.q.out 07ce108a75 
>   ql/src/test/results/clientpositive/colstats_all_nulls.q.out 14c5d5b59b 
>   ql/src/test/results/clientpositive/column_pruner_multiple_children.q.out 96feeed49c

>   ql/src/test/results/clientpositive/columnstats_partlvl.q.out 07d26e92bb 
>   ql/src/test/results/clientpositive/columnstats_partlvl_dp.q.out 468d2e797b 
>   ql/src/test/results/clientpositive/columnstats_quoting.q.out 52e35385a1 
>   ql/src/test/results/clientpositive/columnstats_tbllvl.q.out 462d4c1771 
>   ql/src/test/results/clientpositive/compute_stats_date.q.out c2472377a8 
>   ql/src/test/results/clientpositive/compute_stats_decimal.q.out e0584c50a8 
>   ql/src/test/results/clientpositive/compute_stats_double.q.out 5b921735f0 
>   ql/src/test/results/clientpositive/compute_stats_long.q.out 119d1731cc 
>   ql/src/test/results/clientpositive/compute_stats_string.q.out 8c40490bc0 
>   ql/src/test/results/clientpositive/confirm_initial_tbl_stats.q.out faa14ba9c5 
>   ql/src/test/results/clientpositive/constant_prop_2.q.out 24be5188e2 
>   ql/src/test/results/clientpositive/correlated_join_keys.q.out ec5d008728 
>   ql/src/test/results/clientpositive/cross_join_merge.q.out f4956ded22 
>   ql/src/test/results/clientpositive/decimal_stats.q.out 5d86866e2a 
>   ql/src/test/results/clientpositive/describe_table.q.out 7869494252 
>   ql/src/test/results/clientpositive/display_colstats_tbllvl.q.out a4b18d7cec 
>   ql/src/test/results/clientpositive/encrypted/encryption_move_tbl.q.out 06f8408327 
>   ql/src/test/results/clientpositive/exec_parallel_column_stats.q.out f256ec11bf 
>   ql/src/test/results/clientpositive/hll.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/infer_bucket_sort.q.out 1aea388815 
>   ql/src/test/results/clientpositive/join32.q.out a191284aca 
>   ql/src/test/results/clientpositive/join33.q.out a191284aca 
>   ql/src/test/results/clientpositive/join_parse.q.out 17733acdc3 
>   ql/src/test/results/clientpositive/llap/autoColumnStats_2.q.out ec209b2ef1 
>   ql/src/test/results/clientpositive/llap/auto_join1.q.out 6a0a1d5d09 
>   ql/src/test/results/clientpositive/llap/auto_join21.q.out 25cd67e7f5 
>   ql/src/test/results/clientpositive/llap/auto_join29.q.out f693ce4512 
>   ql/src/test/results/clientpositive/llap/auto_join30.q.out 91a80127a9 
>   ql/src/test/results/clientpositive/llap/cluster.q.out 2fa976b6d5 
>   ql/src/test/results/clientpositive/llap/column_table_stats.q.out fb04ee8cf9 
>   ql/src/test/results/clientpositive/llap/column_table_stats_orc.q.out d55cf30331 
>   ql/src/test/results/clientpositive/llap/columnstats_part_coltype.q.out dc50fb7fc1 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer1.q.out fab5c9c029 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer2.q.out ee645410b9 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer3.q.out f1af97d98d 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer6.q.out 9b71d3ec82 
>   ql/src/test/results/clientpositive/llap/count_dist_rewrite.q.out 6d411365b8 
>   ql/src/test/results/clientpositive/llap/cross_join.q.out ae3f9bf6f5 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 7fd7e20e15

>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_user_level.q.out c4b18b7118

>   ql/src/test/results/clientpositive/llap/dynpart_sort_optimization2.q.out d4811d64d7

>   ql/src/test/results/clientpositive/llap/explainuser_1.q.out 87b8cae445 
>   ql/src/test/results/clientpositive/llap/explainuser_2.q.out f8a6526c67 
>   ql/src/test/results/clientpositive/llap/explainuser_4.q.out 95d13216e7 
>   ql/src/test/results/clientpositive/llap/extrapolate_part_stats_partial_ndv.q.out d97223c9d0

>   ql/src/test/results/clientpositive/llap/filter_union.q.out c4af31715c 
>   ql/src/test/results/clientpositive/llap/groupby1.q.out 0eecbb6f4e 
>   ql/src/test/results/clientpositive/llap/groupby2.q.out 29b85d1f44 
>   ql/src/test/results/clientpositive/llap/groupby_resolution.q.out f2a6ab05ac 
>   ql/src/test/results/clientpositive/llap/having.q.out 267254c0de 
>   ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_1.q.out 083bfc301c 
>   ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_2.q.out a59188a46a 
>   ql/src/test/results/clientpositive/llap/identity_project_remove_skip.q.out b03b96b463

>   ql/src/test/results/clientpositive/llap/jdbc_handler.q.out b4feb0ee1b 
>   ql/src/test/results/clientpositive/llap/join1.q.out d79a405a41 
>   ql/src/test/results/clientpositive/llap/join32_lessSize.q.out c226eed126 
>   ql/src/test/results/clientpositive/llap/join_max_hashtable.q.out 85d45fe712 
>   ql/src/test/results/clientpositive/llap/limit_join_transpose.q.out 99d8d6f664 
>   ql/src/test/results/clientpositive/llap/limit_pushdown.q.out 93315e6b1b 
>   ql/src/test/results/clientpositive/llap/limit_pushdown3.q.out 351ee01b45 
>   ql/src/test/results/clientpositive/llap/llap_smb.q.out e3044baf52 
>   ql/src/test/results/clientpositive/llap/llap_stats.q.out f81ad50679 
>   ql/src/test/results/clientpositive/llap/llap_vector_nohybridgrace.q.out 0e9e1207cb

>   ql/src/test/results/clientpositive/llap/llapdecider.q.out 69312cd6a2 
>   ql/src/test/results/clientpositive/llap/merge1.q.out 8021b67733 
>   ql/src/test/results/clientpositive/llap/merge2.q.out 7bcdd2d57e 
>   ql/src/test/results/clientpositive/llap/mergejoin.q.out 10fb45d284 
>   ql/src/test/results/clientpositive/llap/mrr.q.out fe477fd815 
>   ql/src/test/results/clientpositive/llap/multiMapJoin2.q.out 25378d3e8b 
>   ql/src/test/results/clientpositive/llap/offset_limit.q.out adfeb05448 
>   ql/src/test/results/clientpositive/llap/offset_limit_ppd_optimizer.q.out 48e58e1bd9

>   ql/src/test/results/clientpositive/llap/parallel_colstats.q.out 95ed8b813a 
>   ql/src/test/results/clientpositive/llap/ptf.q.out fbaf1e6474 
>   ql/src/test/results/clientpositive/llap/ptf_streaming.q.out 6013c11c9e 
>   ql/src/test/results/clientpositive/llap/reduce_deduplicate_extended.q.out bc44db795a

>   ql/src/test/results/clientpositive/llap/semijoin_hint.q.out 76c985e727 
>   ql/src/test/results/clientpositive/llap/skewjoin.q.out dc79b26020 
>   ql/src/test/results/clientpositive/llap/subquery_exists.q.out 0749872253 
>   ql/src/test/results/clientpositive/llap/subquery_in.q.out 1d5f547adf 
>   ql/src/test/results/clientpositive/llap/subquery_multi.q.out 41b8ea3f25 
>   ql/src/test/results/clientpositive/llap/subquery_notin.q.out c760af2f87 
>   ql/src/test/results/clientpositive/llap/subquery_scalar.q.out 03d5f191b6 
>   ql/src/test/results/clientpositive/llap/subquery_select.q.out 4917c42b0e 
>   ql/src/test/results/clientpositive/llap/subquery_views.q.out b64e0f49c6 
>   ql/src/test/results/clientpositive/llap/sysdb.q.out fbbf8d9b7f 
>   ql/src/test/results/clientpositive/llap/tez_dml.q.out 786929e7af 
>   ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_1.q.out 626a0640da 
>   ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_2.q.out 273cf06e31 
>   ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_3.q.out 04e0f86fe2 
>   ql/src/test/results/clientpositive/llap/tez_join_hash.q.out 92a188e18a 
>   ql/src/test/results/clientpositive/llap/tez_join_tests.q.out 1e1b63d95e 
>   ql/src/test/results/clientpositive/llap/tez_joins_explain.q.out 5297d7ecbc 
>   ql/src/test/results/clientpositive/llap/tez_smb_main.q.out 66d7aeca70 
>   ql/src/test/results/clientpositive/llap/tez_union.q.out c72b232b35 
>   ql/src/test/results/clientpositive/llap/tez_union2.q.out 7b45c7c719 
>   ql/src/test/results/clientpositive/llap/tez_union_multiinsert.q.out 5e0d072095 
>   ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_1.q.out a38e339012

>   ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_2.q.out f7c694bf84

>   ql/src/test/results/clientpositive/llap/unionDistinct_1.q.out 602d57625c 
>   ql/src/test/results/clientpositive/llap/union_top_level.q.out 268e0413cd 
>   ql/src/test/results/clientpositive/llap/varchar_udf1.q.out 8162305df6 
>   ql/src/test/results/clientpositive/llap/vector_groupby_mapjoin.q.out 16b716c4e5 
>   ql/src/test/results/clientpositive/llap/vector_left_outer_join.q.out 6118c73c2b 
>   ql/src/test/results/clientpositive/llap/vector_mapjoin_reduce.q.out 8df5a64fee 
>   ql/src/test/results/clientpositive/llap/vector_outer_join1.q.out d95604d722 
>   ql/src/test/results/clientpositive/llap/vector_outer_join2.q.out 16de760424 
>   ql/src/test/results/clientpositive/llap/vector_udf1.q.out 16edaacf94 
>   ql/src/test/results/clientpositive/llap/vectorization_0.q.out fba9c07350 
>   ql/src/test/results/clientpositive/llap/vectorization_8.q.out 0d5b6d53e0 
>   ql/src/test/results/clientpositive/llap/vectorization_short_regress.q.out 00577620d8

>   ql/src/test/results/clientpositive/llap/vectorized_distinct_gby.q.out c3e5f7c90d 
>   ql/src/test/results/clientpositive/llap/vectorized_dynamic_semijoin_reduction.q.out
9a1c44c3e6 
>   ql/src/test/results/clientpositive/llap/vectorized_dynamic_semijoin_reduction2.q.out
a03466f859 
>   ql/src/test/results/clientpositive/llap/vectorized_mapjoin.q.out 8160bc7c44 
>   ql/src/test/results/clientpositive/llap/vectorized_multi_output_select.q.out f744eb6513

>   ql/src/test/results/clientpositive/llap/vectorized_nested_mapjoin.q.out 28a8340a9c

>   ql/src/test/results/clientpositive/llap/vectorized_shufflejoin.q.out 73ab9fca82 
>   ql/src/test/results/clientpositive/llap/windowing_gby.q.out 2c47b8b2a6 
>   ql/src/test/results/clientpositive/parallel_colstats.q.out c85113137b 
>   ql/src/test/results/clientpositive/partial_column_stats.q.out 5876efacf3 
>   ql/src/test/results/clientpositive/partition_coltype_literals.q.out 3505556029 
>   ql/src/test/results/clientpositive/reduceSinkDeDuplication_pRS_key_empty.q.out 3ad09e815c

>   ql/src/test/results/clientpositive/remove_exprs_stats.q.out 33cf90ae9d 
>   ql/src/test/results/clientpositive/rename_external_partition_location.q.out ec4076f908

>   ql/src/test/results/clientpositive/rename_table_update_column_stats.q.out b3d6f039ac

>   ql/src/test/results/clientpositive/temp_table_display_colstats_tbllvl.q.out bd024a7ab1

>   ql/src/test/results/clientpositive/tez/explainanalyze_1.q.out 6602222ed7 
>   ql/src/test/results/clientpositive/tez/explainanalyze_2.q.out 0916565f0f 
>   ql/src/test/results/clientpositive/tez/explainanalyze_3.q.out e5c8d6c51e 
>   ql/src/test/results/clientpositive/tez/explainanalyze_4.q.out 9fbe8c5263 
>   ql/src/test/results/clientpositive/tez/explainanalyze_5.q.out b35e294813 
>   ql/src/test/results/clientpositive/tez/explainuser_3.q.out 65c9114b20 
>   ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_1.q.out 0a1e039cf1 
>   ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_2.q.out 6f5a3a96ca 
>   ql/src/test/results/clientpositive/tez/vectorization_limit.q.out afcae8c34c 
>   ql/src/test/results/clientpositive/tunable_ndv.q.out 6ae54b4927 
> 
> 
> Diff: https://reviews.apache.org/r/60753/diff/3/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message