hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashutosh Chauhan <hashut...@apache.org>
Subject Re: Review Request 63442: HIVE-17934 Merging Statistics are promoted to COMPLETE (most of the time)
Date Mon, 13 Nov 2017 15:32:57 GMT


> On Nov. 9, 2017, 7:51 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/llap/auto_sortmerge_join_12.q.out
> > Line 160 (original), 160 (patched)
> > <https://reviews.apache.org/r/63442/diff/2/?file=1886244#file1886244line160>
> >
> >     bucket_small has no stats gathered. This should be NONE.
> 
> Zoltan Haindrich wrote:
>     `hive.stats.autogather` is enabled by default from `HiveConf`
> 
> Ashutosh Chauhan wrote:
>     Those are load statements, not inserts. We don't gather stats with load statements
only with insets.
> 
> Zoltan Haindrich wrote:
>     sorry, you are right: basic stats are not gathered in this case in any way.
>     
>     But the stat state is complete; because: there is logic which scans the file sizes
- to calculate the datasizes; and from there HIVE-16811 can guess some row counts
>     
>     https://github.com/kgyrtkirk/hive/blob/9f67a878512117eb5c251794adc1a91bae62fea7/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L386-L393
>     
>     Firts I would like to make the standalone table/partitioned table's calculation-s
are a bit more similar to eachother
>     
>     I've tried to come up with some definitions for NONE/PARTIAL/COMPLETE; currently
I would say the following:
>     
>     * NONE: not known
>         * on table: no information (afaik currently this can't happen)
>         * estimation tree: all nodes in the estimation tree were NONE
>     * PARTIAL:
>         * on table: the current information is estimated from data size
>         * estimation tree: contains at least one NONE/PARTIAL
>     * COMPLETE:
>         * current information is correct (calculated by statstask-s)
>         * estimation tree: the whole subtree has COMPLETE status
>     
>     If I use these definitions; then I would say that the filesystem size based estimation
should be considered PARTIAL.

Definitions sounds good. Lets use them to make sure our state calculation logic is built on
it.
Can you also add this in code comments.


- Ashutosh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63442/#review190633
-----------------------------------------------------------


On Nov. 9, 2017, 5:39 p.m., Zoltan Haindrich wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63442/
> -----------------------------------------------------------
> 
> (Updated Nov. 9, 2017, 5:39 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-17934
>     https://issues.apache.org/jira/browse/HIVE-17934
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> * remove the reactive stat state guessing method
> * make the guessing only work when a new object is created
> * change the way stat objects are merged
> 
> this patch will most probably break almost all qtest outputs....
> 
> 
> Diffs
> -----
> 
>   accumulo-handler/src/test/results/positive/accumulo_queries.q.out b3adf4e504 
>   hbase-handler/src/test/results/positive/hbase_queries.q.out b2eda12e95 
>   hbase-handler/src/test/results/positive/hbasestats.q.out 29eefd43a9 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java 7a3fae65e8

>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
a4f60accce 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/Statistics.java 8ffb4ce44b 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java ce7c96c639 
>   ql/src/test/queries/clientpositive/lateral_view_onview2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/stats_empty_partition2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/acid_table_stats.q.out 351ff0da0a 
>   ql/src/test/results/clientpositive/alterColumnStatsPart.q.out 858e16fe22 
>   ql/src/test/results/clientpositive/annotate_stats_part.q.out 3a94a6a4e3 
>   ql/src/test/results/clientpositive/auto_sortmerge_join_12.q.out 7875e9693a 
>   ql/src/test/results/clientpositive/cbo_const.q.out e9f885b363 
>   ql/src/test/results/clientpositive/cbo_input26.q.out 77fc194829 
>   ql/src/test/results/clientpositive/columnstats_partlvl_dp.q.out 414b715b7a 
>   ql/src/test/results/clientpositive/columnstats_quoting.q.out 683c1e274f 
>   ql/src/test/results/clientpositive/columnstats_tbllvl.q.out a2c6ead293 
>   ql/src/test/results/clientpositive/constGby.q.out c633624935 
>   ql/src/test/results/clientpositive/constant_prop_3.q.out cba4744866 
>   ql/src/test/results/clientpositive/constprog3.q.out f54168d0ee 
>   ql/src/test/results/clientpositive/correlationoptimizer10.q.out a03acd38a7 
>   ql/src/test/results/clientpositive/correlationoptimizer11.q.out cf2250790a 
>   ql/src/test/results/clientpositive/correlationoptimizer13.q.out 6d4f931213 
>   ql/src/test/results/clientpositive/correlationoptimizer14.q.out 149f33fee8 
>   ql/src/test/results/clientpositive/correlationoptimizer15.q.out 2d813b239f 
>   ql/src/test/results/clientpositive/correlationoptimizer5.q.out 68d6a54862 
>   ql/src/test/results/clientpositive/correlationoptimizer7.q.out 82fecab594 
>   ql/src/test/results/clientpositive/correlationoptimizer8.q.out f3cb988a03 
>   ql/src/test/results/clientpositive/correlationoptimizer9.q.out 5372408d2a 
>   ql/src/test/results/clientpositive/cte_mat_5.q.out 3747cec891 
>   ql/src/test/results/clientpositive/display_colstats_tbllvl.q.out 8e2e77b077 
>   ql/src/test/results/clientpositive/druid_basic2.q.out 753ccb456f 
>   ql/src/test/results/clientpositive/empty_join.q.out a4a9976a7f 
>   ql/src/test/results/clientpositive/filter_cond_pushdown_HIVE_15647.q.out 779bea3a26

>   ql/src/test/results/clientpositive/groupby_sort_6.q.out a66ec97642 
>   ql/src/test/results/clientpositive/having2.q.out 80301bfc04 
>   ql/src/test/results/clientpositive/input23.q.out 80ee81b654 
>   ql/src/test/results/clientpositive/input26.q.out 1ac082eedf 
>   ql/src/test/results/clientpositive/join_cond_pushdown_unqual1.q.out 74f45e58c0 
>   ql/src/test/results/clientpositive/join_cond_pushdown_unqual2.q.out 2ac67b294c 
>   ql/src/test/results/clientpositive/join_cond_pushdown_unqual3.q.out b8d9b408d7 
>   ql/src/test/results/clientpositive/join_cond_pushdown_unqual4.q.out e5ddc3507f 
>   ql/src/test/results/clientpositive/join_view.q.out 1d83742dd4 
>   ql/src/test/results/clientpositive/lateral_view_onview.q.out 423885e442 
>   ql/src/test/results/clientpositive/lateral_view_onview2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/list_bucket_query_oneskew_2.q.out 876434fb4e 
>   ql/src/test/results/clientpositive/llap/auto_sortmerge_join_12.q.out 3acbb207a7 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 67fe41e223

>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction_sw.q.out 1c672ef068

>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_user_level.q.out a51637a2b9

>   ql/src/test/results/clientpositive/llap/dynpart_sort_optimization_acid.q.out 02cadb7cff

>   ql/src/test/results/clientpositive/llap/llap_nullscan.q.out 2a891234e5 
>   ql/src/test/results/clientpositive/llap/mapjoin_hint.q.out 505524e78c 
>   ql/src/test/results/clientpositive/llap/mapreduce1.q.out 0e94e71d27 
>   ql/src/test/results/clientpositive/llap/mapreduce2.q.out 6485f587f8 
>   ql/src/test/results/clientpositive/llap/metadataonly1.q.out e6853b23e3 
>   ql/src/test/results/clientpositive/llap/reduce_deduplicate.q.out 65b74ee319 
>   ql/src/test/results/clientpositive/llap/subquery_in.q.out c7b98d3967 
>   ql/src/test/results/clientpositive/llap/subquery_multi.q.out d1579033ac 
>   ql/src/test/results/clientpositive/llap/subquery_null_agg.q.out 78ee174935 
>   ql/src/test/results/clientpositive/llap/subquery_scalar.q.out 06a929dd0a 
>   ql/src/test/results/clientpositive/llap/subquery_select.q.out 514a7889b3 
>   ql/src/test/results/clientpositive/llap/tez_smb_empty.q.out 7a4db158c8 
>   ql/src/test/results/clientpositive/llap/vector_windowing_gby2.q.out ce1881b7fb 
>   ql/src/test/results/clientpositive/llap/vector_windowing_streaming.q.out 61730f59ee

>   ql/src/test/results/clientpositive/llap/vectorization_short_regress.q.out 3e246bcbe6

>   ql/src/test/results/clientpositive/materialized_view_rewrite_ssb.q.out de491989a5 
>   ql/src/test/results/clientpositive/materialized_view_rewrite_ssb_2.q.out a11d66815a

>   ql/src/test/results/clientpositive/nullgroup3.q.out fe23f39fd8 
>   ql/src/test/results/clientpositive/nullgroup5.q.out 783f6d76b6 
>   ql/src/test/results/clientpositive/partial_column_stats.q.out 44db81a443 
>   ql/src/test/results/clientpositive/perf/spark/query66.q.out 1dc0fac408 
>   ql/src/test/results/clientpositive/perf/spark/query99.q.out c0c5f136ec 
>   ql/src/test/results/clientpositive/position_alias_test_1.q.out ee81a79a0b 
>   ql/src/test/results/clientpositive/ppd_outer_join5.q.out 84c10828ce 
>   ql/src/test/results/clientpositive/ppd_repeated_alias.q.out c94002f37d 
>   ql/src/test/results/clientpositive/row__id.q.out 9aab097f21 
>   ql/src/test/results/clientpositive/semijoin4.q.out 53f6c174bd 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_12.q.out 09caf944d2 
>   ql/src/test/results/clientpositive/spark/join_cond_pushdown_unqual1.q.out dc9b61e39a

>   ql/src/test/results/clientpositive/spark/join_cond_pushdown_unqual2.q.out 82634fba44

>   ql/src/test/results/clientpositive/spark/join_cond_pushdown_unqual3.q.out d1b20006b0

>   ql/src/test/results/clientpositive/spark/join_cond_pushdown_unqual4.q.out 2bfc81d275

>   ql/src/test/results/clientpositive/spark/join_view.q.out 61867f75f3 
>   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out d294f4910c 
>   ql/src/test/results/clientpositive/spark/ppd_outer_join5.q.out e49260aa35 
>   ql/src/test/results/clientpositive/spark/semijoin.q.out d2dac10f3f 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_7.q.out e2f68a02bc 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out d7b445baf8

>   ql/src/test/results/clientpositive/spark/spark_vectorized_dynamic_partition_pruning.q.out
1a8e9ffcc5 
>   ql/src/test/results/clientpositive/spark/subquery_in.q.out fd25e36fba 
>   ql/src/test/results/clientpositive/spark/subquery_multi.q.out b91c33ee4a 
>   ql/src/test/results/clientpositive/spark/subquery_null_agg.q.out 945e2a7102 
>   ql/src/test/results/clientpositive/spark/subquery_scalar.q.out 8f3ac0d636 
>   ql/src/test/results/clientpositive/spark/subquery_select.q.out edb2b92f73 
>   ql/src/test/results/clientpositive/spark/union_remove_25.q.out f681428785 
>   ql/src/test/results/clientpositive/spark/vectorization_short_regress.q.out 78740fec6f

>   ql/src/test/results/clientpositive/stats_empty_partition2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/subquery_exists_having.q.out ef06dfe697 
>   ql/src/test/results/clientpositive/subquery_unqualcolumnrefs.q.out 79b7d83619 
>   ql/src/test/results/clientpositive/temp_table_display_colstats_tbllvl.q.out a202e45be9

>   ql/src/test/results/clientpositive/union_remove_25.q.out 20ab809cb1 
>   ql/src/test/results/clientpositive/union_view.q.out 35f8a9a226 
> 
> 
> Diff: https://reviews.apache.org/r/63442/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Zoltan Haindrich
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message