drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jinfeng Ni" <...@maprtech.com>
Subject Re: Review Request 30959: DRILL-2236: Optimize hash inner join by swapping inputs based on row count comparison.
Date Wed, 18 Feb 2015 03:32:28 GMT


> On Feb. 13, 2015, 3:58 p.m., Aman Sinha wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/SwapHashJoinVisitor.java,
line 30
> > <https://reviews.apache.org/r/30959/diff/1/?file=862732#file862732line30>
> >
> >     Can you add javadoc comments to describe what this visitor is doing.

Add javadoc comments.


> On Feb. 13, 2015, 3:58 p.m., Aman Sinha wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/SwapHashJoinVisitor.java,
line 59
> > <https://reviews.apache.org/r/30959/diff/1/?file=862732#file862732line59>
> >
> >     We should have a percentage threshold to do the comparison.. i.e swap only if
right side is 10% more than left side.  Also, I feel that we should have some option where
we don't put intermediate result on the right side of the join, because the estimates on the
intermediate result is highly unpredictable.

Add another option to set the percentage threshold. The default is 10%.

I did not add option where we don't put intermediate result on the right side of the join.
It's true that the intermediate result is not accurate, and the decision based on that may
not lead to a plan that we want. However, the current cost-based optimizer also heavily replies
on the intermediate result estmiates. In that sense, I feel both of approachs would face the
same issue. 

Also, I'm thinking to disable this feature by default. Only when users see performance hit
because of hash join order, they could turn on the option. This way, it gives user a workaround,
and avoid the overhead to either re-write the query, or mannully modify the JSON physical
plan.


- Jinfeng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30959/#review72454
-----------------------------------------------------------


On Feb. 12, 2015, 5:10 p.m., Jinfeng Ni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30959/
> -----------------------------------------------------------
> 
> (Updated Feb. 12, 2015, 5:10 p.m.)
> 
> 
> Review request for drill.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> See https://issues.apache.org/jira/browse/DRILL-2236#.
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java
a3c42de 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java 3541db7

>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrel.java
394a82c 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java
faa8546 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/explain/NumberingRelWriter.java
6522ad9 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/SwapHashJoinVisitor.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java
6b3d301 
>   exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
aa0a5ad 
> 
> Diff: https://reviews.apache.org/r/30959/diff/
> 
> 
> Testing
> -------
> 
> Unit test done. 
> 
> Other regression workloads are running. Will update status once complete.
> 
> 
> Thanks,
> 
> Jinfeng Ni
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message