hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Dere <jd...@hortonworks.com>
Subject Re: Review Request 63427: HIVE-17396
Date Mon, 30 Oct 2017 23:19:04 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63427/#review189657
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
Line 1417 (original), 1417 (patched)
<https://reviews.apache.org/r/63427/#comment266862>

    Trying to think about how to think about this setting if we're going to use this for tuning.
I think a better way of being able to think about this setting is, what kind of selectivity
we want from the semijoin reduction before we decide it is worth keeping. For me this setting
might be a bit more intuitive (basically float value between 0-1) - for example setting config
to 0.5, compared to what you have now, where I think you would set it to 2.0.



ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
Lines 1419 (patched)
<https://reviews.apache.org/r/63427/#comment266860>

    Some concerns about long-to-float conversion going on here .. nDVs is cast to float then
multiplied to 1.0, and nDVsOfTS is also converted to float during this comparison. This could
affect the comparisoin results. Maybe cast nDVsFactored to long when doing the comparision
to be safe?



ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
Lines 1421 (patched)
<https://reviews.apache.org/r/63427/#comment266863>

    If you are logging here, mention that setShouldRemove is being set. Would be more useful
for someone looking at the logs.


- Jason Dere


On Oct. 30, 2017, 7:09 p.m., Deepak Jaiswal wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63427/
> -----------------------------------------------------------
> 
> (Updated Oct. 30, 2017, 7:09 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jason Dere.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Dynamic Semijoin Reduction : markSemiJoinForDPP marks unwanted semijoin branches
> 
> In method markSemiJoinForDPP (HIVE-17399), the nDVs comparison should not have equality
as there is a chance that the values are same on both sides and the branch is still marked
as good when it shouldn't be.
> Add a configurable factor to see how useful this is if nDVs on smaller side are only
slightly less than that on TS side.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6631a6e45d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java da30c3b642 
>   ql/src/test/queries/clientpositive/dynamic_semijoin_reduction.q 6cc0a7f7a9 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 1a1a4d9b2d

> 
> 
> Diff: https://reviews.apache.org/r/63427/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message