hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jesús Camacho Rodríguez <jcamachorodrig...@hortonworks.com>
Subject Re: Review Request 59998: HIVE-16867
Date Wed, 21 Jun 2017 08:48:25 GMT


> On June 21, 2017, 2:03 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedScanOptimizer.java
> > Line 77 (original), 108 (patched)
> > <https://reviews.apache.org/r/59998/diff/1/?file=1747954#file1747954line108>
> >
> >     Shall rename this to SharedWorkOptimizer.

I changed this. Should I change the name of the HiveConf property too since we have not had
a release since it was introduced (I think it will help avoiding confusion)?


> On June 21, 2017, 2:03 a.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/llap/dynamic_partition_pruning.q.out
> > Lines 2242-2252 (original), 2242-2246 (patched)
> > <https://reviews.apache.org/r/59998/diff/1/?file=1747965#file1747965line2242>
> >
> >     Is this change expected? Filter is removed from both branches. Technically,
this is correct since ds is partition column, but logic to remove filters on partition column
is in PartitionConditionRemover. Does SharedScan has logic to remove redundant filters in
tree?

Actually there were no Filter operators before SharedScanOptimizer was introduced. These Filter
operators were introduced by the SharedScanOptimizer, as previous version was pushing the
filter expressions above the Scan when it was reusing it. Currently, as logic for reutilization
has changed, it does not do that anymore, which is a good thing.


> On June 21, 2017, 2:03 a.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/llap/except_distinct.q.out
> > Lines 421-425 (original), 413-417 (patched)
> > <https://reviews.apache.org/r/59998/diff/1/?file=1747966#file1747966line421>
> >
> >     Missed opportunity. Could have removed this Gby operator as well.

In this case, we have three branches.
TS[0]-SEL[1]-GBY[3]-RS[4]-...
TS[27]-SEL[28]-GBY[30]-RS[31]-...
TS[47]-SEL[48]-GBY[50]-RS[51]-...

The first two branches merge up to Select operator, as GBY operators are different (_count(2)_
is considered different to _count(1)_ as we do not do any special reasoning around expressions
right now).
TS[0]-SEL[1]-GBY[3]-RS[4]-...
            -GBY[30]-RS[31]-...
TS[47]-SEL[48]-GBY[50]-RS[51]-...

Then, we try to merge the third branch. However, currently there is a limitation in the implementation
where follow-up reutilizations (third branch in this case) cannot go further that the last
reused operator (SEL[1] in this case). That is why we end up with following plan:
TS[0]-SEL[1]-GBY[3]-RS[4]-
            -GBY[30]-RS[31]-
            -GBY[50]-RS[51]-

I think I can lift that restriction: I introduced it because multi-branching of the children
makes the code more complicated (we would have to follow different branches and end up using
a heuristic to choose which one to reuse, e.g., longest path?) and I was already changing
almost completely the logic in SharedScanOptimizer.
However, I think this is something that can be tackled in a follow-up (in fact, I had already
left TODO comments in the code, see _extractSharedOptimizationInfo_ in SharedScanOptimizer).

Btw, another idea for a follow-up: rewrite all _count_ on constant to the same expression
in a Calcite rule so 1) we can prune more columns, and 2) find more equivalences for reutilization
and MVs rewriting.


> On June 21, 2017, 2:03 a.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/llap/subquery_scalar.q.out
> > Lines 824-839 (original), 819-830 (patched)
> > <https://reviews.apache.org/r/59998/diff/1/?file=1747979#file1747979line824>
> >
> >     Future improvement: Zip together 2 Gbys with different aggregations in a single
Gby with projects after it projecting diff aggregates for diff branches.

I can create a follow-up for this too. However, I think the question that remains is how much
further we want to take this rule vs implementing proper Spool operator at the logical level.
The complexity of this kind of extension to the existing code is that currently we do not
rewrite the operator tree in any way, we just reutilize what is already there; this extension
would imply creating/inserting new operators.


> On June 21, 2017, 2:03 a.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/llap/subquery_scalar.q.out
> > Lines 2491-2509 (original), 2482-2495 (patched)
> > <https://reviews.apache.org/r/59998/diff/1/?file=1747979#file1747979line2491>
> >
> >     Gbys doing diff aggregations but on same key can be further zipped.

Same comment as above.


- Jesús


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59998/#review178443
-----------------------------------------------------------


On June 12, 2017, 10:42 a.m., Jesús Camacho Rodríguez wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59998/
> -----------------------------------------------------------
> 
> (Updated June 12, 2017, 10:42 a.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-16867
>     https://issues.apache.org/jira/browse/HIVE-16867
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-16867
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java d5006bd52db42e3bb2b650e099d656746834b497

>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java
a9099b868001f4a917b91540fb305db25ccac664 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java 1da91641b989a43b4ce5f6a09e0eabe6487e9157

>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedScanOptimizer.java e31119fd081b5989e64e00c6d903c53040dfd8d3

>   ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java a9c1e61ba94574d786c3be912a0b4f9eab20db96

>   ql/src/java/org/apache/hadoop/hive/ql/plan/DynamicPruningEventDesc.java 73bbebd888c508cf7add51287326ba133df9e4d3

>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 04686f7fb98b558d7c7d671341f6e121e7c282c6

>   ql/src/test/results/clientpositive/llap/auto_join0.q.out 6d051ea3f5117c043f79cfc9c641f1a8ace3c87e

>   ql/src/test/results/clientpositive/llap/auto_join30.q.out 90c4241f192cfe8f40acc8a1fc0e19fbc846bd97

>   ql/src/test/results/clientpositive/llap/auto_sortmerge_join_9.q.out bdb30d735bc481b11fa6b44563ba74861841071b

>   ql/src/test/results/clientpositive/llap/bucket_map_join_tez1.q.out 042c60bf17f4d696e771dfb24d7f7a4fc146f309

>   ql/src/test/results/clientpositive/llap/correlationoptimizer2.q.out cdae4fbfbf107394ccd58e7d9d1d2aeaeaa285ec

>   ql/src/test/results/clientpositive/llap/correlationoptimizer3.q.out 3e715463a48a18986a82c4e55d8c8576821b0a05

>   ql/src/test/results/clientpositive/llap/correlationoptimizer6.q.out 82dae9afbb3ae101b7b7cf8214ecfdb08f9d5d34

>   ql/src/test/results/clientpositive/llap/dynamic_partition_pruning.q.out 2875e13bbaccb4e5bfee3e682e5b2c88acaf31a5

>   ql/src/test/results/clientpositive/llap/except_distinct.q.out e4c2941f67858e0171d00032945e4e7d2b7500c9

>   ql/src/test/results/clientpositive/llap/explainuser_1.q.out 8b04bc9261478fee17b82c780a161410a704f4ac

>   ql/src/test/results/clientpositive/llap/explainuser_2.q.out e3f70b097f11255276ffc80082c6c8c115a76a1e

>   ql/src/test/results/clientpositive/llap/intersect_merge.q.out a31296672023080cc91865fbb86cdbeb891b2167

>   ql/src/test/results/clientpositive/llap/limit_pushdown.q.out 57594e016484ea9020b349339a8a9be63709a6cb

>   ql/src/test/results/clientpositive/llap/mrr.q.out 726349c7f16823f1ff54a8575d8d8a27a50f1772

>   ql/src/test/results/clientpositive/llap/multiMapJoin2.q.out b4b0e93c82d73c60bcfe345cea1cb9c8e26ab145

>   ql/src/test/results/clientpositive/llap/offset_limit_ppd_optimizer.q.out c89ca6b5cd9fbb26cbf0b416e4ca71ee38d32a3f

>   ql/src/test/results/clientpositive/llap/partition_shared_scan.q.out 34ba87cc9158b2d23ce8d3c6b3defd2b8a4d36f0

>   ql/src/test/results/clientpositive/llap/subquery_in.q.out 1f9c9e447416c153f909ebbe960a8343454dde14

>   ql/src/test/results/clientpositive/llap/subquery_multi.q.out 29516eff82a63a1c551a4173ea1e7ea640bdcef0

>   ql/src/test/results/clientpositive/llap/subquery_notin.q.out b4af91579bb44cd30503d48de72a2f8f36e98501

>   ql/src/test/results/clientpositive/llap/subquery_null_agg.q.out bff27810e9b0999a3794b1fd505bb602236d40ad

>   ql/src/test/results/clientpositive/llap/subquery_scalar.q.out e94edff262685a2e2e3c4ab4ba6382b5d2cc186d

>   ql/src/test/results/clientpositive/llap/subquery_select.q.out 202980e975b4bdd988df856bc1e533990e5c664b

>   ql/src/test/results/clientpositive/llap/subquery_views.q.out 1a21a02a309d7fc931513d2c89f0aee3bc8c0fc6

>   ql/src/test/results/clientpositive/llap/unionDistinct_1.q.out b4b601993b6eddbe3a62a917267e214a1c09ceef

>   ql/src/test/results/clientpositive/llap/union_top_level.q.out 2fac8ccf0c8f0c1117874c502c2970d48060e476

>   ql/src/test/results/clientpositive/llap/vector_groupby_grouping_sets4.q.out e1ad06c7de5d4fe729ba56ea89be1a55c6b4d487

>   ql/src/test/results/clientpositive/llap/vector_join30.q.out ec767507003f09d93f8491f454abf2351bd4226b

>   ql/src/test/results/clientpositive/llap/vectorized_dynamic_partition_pruning.q.out
d9fc6b5f5834a041d1f19495234c30948aca2751 
>   ql/src/test/results/clientpositive/perf/query1.q.out da4a65c86286555f63b25b8e63117d1df5430bc4

>   ql/src/test/results/clientpositive/perf/query10.q.out 9b6621c1aa03728ea2736474d23bf0b89e42d8f6

>   ql/src/test/results/clientpositive/perf/query14.q.out 048a17f92f67c4ca78ab1fcfcdbc4572d784e39a

>   ql/src/test/results/clientpositive/perf/query16.q.out a7f93f9ec28f38675429429c1668ae7a09b0e5c0

>   ql/src/test/results/clientpositive/perf/query2.q.out 50d7f7bcfacc81cdc5e3d7485d9ea2c5b5bf1d52

>   ql/src/test/results/clientpositive/perf/query23.q.out 1fd8cb4f259a12e4b805843e4808888575b055bb

>   ql/src/test/results/clientpositive/perf/query24.q.out 2aa0c1929879f6d95b845477b8eb40a227e2e21b

>   ql/src/test/results/clientpositive/perf/query30.q.out a86f11d92226e36da0a306b638d504d575aca9a3

>   ql/src/test/results/clientpositive/perf/query31.q.out b3a74f3cae6ef0c9c8465c02ce0b9de673f27cf6

>   ql/src/test/results/clientpositive/perf/query32.q.out 66b848564e55d64c10771e2fb1e835b0c6e453b8

>   ql/src/test/results/clientpositive/perf/query33.q.out c1a5fa28edb5ff94339d43d886693a486d60df31

>   ql/src/test/results/clientpositive/perf/query35.q.out b286a07571e767ddb2e968f58cd7027de063ddf2

>   ql/src/test/results/clientpositive/perf/query38.q.out ae9ada5c0eee11040333d1f2140e0b004cb82c35

>   ql/src/test/results/clientpositive/perf/query39.q.out cf139f285da4b5eb89ffa898e79d9ac614315ae6

>   ql/src/test/results/clientpositive/perf/query4.q.out 1b2048649a8de385ff14cd83df6e9a612c93ab2b

>   ql/src/test/results/clientpositive/perf/query44.q.out 566548089c4c7ea54f5631fb97a3497dc7653573

>   ql/src/test/results/clientpositive/perf/query46.q.out 680670329551ddb98fe9c53a3aa9b7ae9501fcd6

>   ql/src/test/results/clientpositive/perf/query47.q.out d5e192259b3761f4c2561d356b9942324facf0d8

>   ql/src/test/results/clientpositive/perf/query49.q.out 8b8ad8b5c27cf7bf7a03ca0bdac80025078a7cce

>   ql/src/test/results/clientpositive/perf/query5.q.out a3f2d58fecbe701a744c5b1fbeeaf03b54d403b8

>   ql/src/test/results/clientpositive/perf/query51.q.out 0ce3e9f6dbb1c2b03c9cee7a5d689766cb87e966

>   ql/src/test/results/clientpositive/perf/query54.q.out 3cbcbe33f9cc2ab7e6dd90bf1a3692aeb5e2c6ab

>   ql/src/test/results/clientpositive/perf/query56.q.out 4ec7201fa7ca1530dcedb23d5ecb87579c0f6b98

>   ql/src/test/results/clientpositive/perf/query57.q.out 6c237bfd0b50e549d2492d7a06ea9e3ae81c1179

>   ql/src/test/results/clientpositive/perf/query58.q.out acdfc077183683e070067dcccd59b38ba94e737b

>   ql/src/test/results/clientpositive/perf/query59.q.out b570b96376b1ec6c97c2ce82c459472e435716f3

>   ql/src/test/results/clientpositive/perf/query60.q.out 12d8cdd9b40747e1da280309452e81d15fcde471

>   ql/src/test/results/clientpositive/perf/query61.q.out 683833265b50c0463aa4d2f8ba6ff0ea2763c7fe

>   ql/src/test/results/clientpositive/perf/query64.q.out f24b14d4408e4412b4161626523bc278ed1367d5

>   ql/src/test/results/clientpositive/perf/query65.q.out b2035c2a98a936fd6bdf94c472a41bd54515c685

>   ql/src/test/results/clientpositive/perf/query66.q.out 2c748151c96b41755c3e02422c3a7166cb00b47d

>   ql/src/test/results/clientpositive/perf/query68.q.out bd9b5ecd975e0fe4694f0b5555b5b39ac5a9dd1d

>   ql/src/test/results/clientpositive/perf/query69.q.out a55c36887da6c1e8d1816acdeb4a82ffdabd7e3d

>   ql/src/test/results/clientpositive/perf/query70.q.out ee1fe86ff35bfb00a7a1f992fff954227215bb04

>   ql/src/test/results/clientpositive/perf/query75.q.out d399567ab9cc7cb436c7be5cff70f861f2303092

>   ql/src/test/results/clientpositive/perf/query76.q.out dcd5004166e9b74a1b6371af70b12e2e13131832

>   ql/src/test/results/clientpositive/perf/query77.q.out d46ba6b13c28733073b9af1d9170787b77151f16

>   ql/src/test/results/clientpositive/perf/query78.q.out b49103f1e80dd25eae23a7c9eeef724c5a49e5ee

>   ql/src/test/results/clientpositive/perf/query80.q.out 3cf41f3fed326fbe91618aacf5a7ba1f9b44248d

>   ql/src/test/results/clientpositive/perf/query81.q.out abeb57759ce66fb3f935b9accb43861d968226a9

>   ql/src/test/results/clientpositive/perf/query83.q.out 396c423c3180dd1d791f4f55bc61e7f2299486a1

>   ql/src/test/results/clientpositive/perf/query85.q.out 86b961b1cd4a1fdad7a0eed788089190d8e60556

>   ql/src/test/results/clientpositive/perf/query87.q.out 58a33d9e67c1d841c8eab67c79cd348afeaf4f24

>   ql/src/test/results/clientpositive/perf/query88.q.out 18d0a774ca80c4ac97490539170eaaec1c3c72ac

>   ql/src/test/results/clientpositive/perf/query90.q.out b3468ecce54a132d06fd55555ef0ec4b0ca54293

>   ql/src/test/results/clientpositive/perf/query92.q.out ec4fbb9a5a0c7445f6ad4b6b77aadc8714be5e5c

>   ql/src/test/results/clientpositive/perf/query94.q.out c5fc9e7f000501c6e839a216761119faae5f5e72

>   ql/src/test/results/clientpositive/perf/query95.q.out 332bef83c5914e09d0d3d520caf43d0d6bff0ffd

>   ql/src/test/results/clientpositive/perf/query97.q.out 2d5129a9015df0450aefdb4a30d688bcb3bef950

>   ql/src/test/results/clientpositive/tez/explainanalyze_2.q.out f6844c4a38bf1f7df457e51af5eb54cfde82bbb9

> 
> 
> Diff: https://reviews.apache.org/r/59998/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jesús Camacho Rodríguez
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message