drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha" <asi...@maprtech.com>
Subject Re: Review Request 33523: DRILL-1957: Support nested loop join planning (for NOT-IN, uncorrelated EXISTS, Inequality)
Date Wed, 29 Apr 2015 01:39:04 GMT


> On April 27, 2015, 3:46 p.m., Jinfeng Ni wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java,
line 42
> > <https://reviews.apache.org/r/33523/diff/1/?file=941039#file941039line42>
> >
> >     Equality join includes both EQUALS and IS_NOT_DISTINCT_FROM comparator, right?

yes, currently I am not distinguishing between those for the join categorization...perhaps
I should but will do this as part of refactoring to use JoinInfo (based on Julian's suggestion
below).


> On April 27, 2015, 3:46 p.m., Jinfeng Ni wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java,
line 43
> > <https://reviews.apache.org/r/33523/diff/1/?file=941039#file941039line43>
> >
> >     If a join has both equality and inequality condition, which category will this
join fall in? 
> >     
> >     e.g: 
> >       T1.c1 = T2.C2 and T1.C3 > T1.C4
> 
> Julian Hyde wrote:
>     Rather than JoinCategory, consider modeling joins the same way as Calcite's JoinInfo.
A join is decomposed into 0 or more equi-join keys and a remaining condition. So, you can
easily tell whether a join is equi, theta, cartesian, or hybrid.
>     
>     You often want to handle hybrid expressions like "T1.c1 = T2.C2 and T1.C3 > T1.C4"
as an equi-join plus a remaining condition (e.g. you want to implement using hash-join followed
by filter).
>     
>     Ideally this code would end up in Calcite at some point. Adopting the same core concepts
will help.
>     
>     Regarding the above remark: Calcite does not recognize IS_NOT_DISTINCT_FROM as an
equi condition, but it ought to.

Agree.. makes sense to use JoinInfo.  Given the time constaints for 0.9 release, I would like
to get this in as-is and make the changes as part of refactoring for the 1.0 release.  I will
file a JIRA to keep track of it.


> On April 27, 2015, 3:46 p.m., Jinfeng Ni wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java,
line 121
> > <https://reviews.apache.org/r/33523/diff/1/?file=941039#file941039line121>
> >
> >     Why do we treat DrillFilterRel differently from other Rel node?

Currently, I am limiting the scalar check for either subquery with scalar aggregate or a filter
on top of the scalar subquery.  It could be generalized to non-filter Rel in the future.


> On April 27, 2015, 3:46 p.m., Jinfeng Ni wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java,
line 145
> > <https://reviews.apache.org/r/33523/diff/1/?file=941039#file941039line145>
> >
> >     Is it good enough to have just the fist condition? If the remaining is not always
true, that means it's inequality join?

I preserved whatever we were doing earlier in DrillJoinRelBase.computeSelfCost().  This code
will get obsolete once I do the refactoring to use JoinInfo.


> On April 27, 2015, 3:46 p.m., Jinfeng Ni wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillJoinRelBase.java,
line 87
> > <https://reviews.apache.org/r/33523/diff/1/?file=941040#file941040line87>
> >
> >     For alwaysTrue condition, is it the case that the join cardinality will be simply
the product of both inputs?

For pure cartesian it is the right estimate.  For inequality conditions, it is an approximation
but a reasonable estimate.


> On April 27, 2015, 3:46 p.m., Jinfeng Ni wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java,
line 239
> > <https://reviews.apache.org/r/33523/diff/1/?file=941044#file941044line239>
> >
> >     Please add comment here, explian why NLJ will be enabled, only when broadcast
Join is eanbled as well.

Added comment.


- Aman


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33523/#review81663
-----------------------------------------------------------


On April 24, 2015, 5:37 p.m., Aman Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33523/
> -----------------------------------------------------------
> 
> (Updated April 24, 2015, 5:37 p.m.)
> 
> 
> Review request for drill and Jinfeng Ni.
> 
> 
> Bugs: DRILL-1957
>     https://issues.apache.org/jira/browse/DRILL-1957
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> After Jinfeng's rebasing of Drill on Calcite master branch, the new Calcite logical plan
for NOT-IN consists of a cartesian join with a scalar subquery.  In order to support this,
we need nested loop join.  This patch adds support for such plans and works in complement
with execution side changes already done earlier. 
> 
> The check for scalar subquery is restricted to scalar aggregates.  Note that a Filter
may appear after a scalar aggregate, hence we also check for that. 
> 
> NL Join plan will do a broadcast of the right (scalar) child.  The join condition for
NL join is always TRUE; however if there is a filter condition present, the planner will create
a Filter-NLJ plan where the filter is done right after NLJ.  In this way, it is possible to
test the NLJ for equality joins also; in order to support that, an option 'planner.enable_nljoin_for_scalar_only'
 has been provided.  The default is TRUE. 
> 
> Besides NOT-IN, other SQL constructs that will be enabled are uncorrelated EXISTS, cartesian
joins (involving scalars unless the above option is enabled) and inequality joins (also involving
scalars).
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java
41bf786 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillJoinRelBase.java
8dc5cf1 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillFilterRel.java
a914f47 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillJoinRel.java
1f602c7 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillJoinRule.java
f832dfe 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java
532fd43 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java
aca55a0 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrule.java
24df0b1 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPrel.java 59b9f41

>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPruleBase.java
d6f1672 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrel.java
3c0022f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrule.java
cbcc920 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/NestedLoopJoinPrel.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/NestedLoopJoinPrule.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java
ac86c4a 
>   exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
a394efe 
>   exec/java-exec/src/test/java/org/apache/drill/TestDisabledFunctionality.java e049943

>   exec/java-exec/src/test/java/org/apache/drill/TestTpchDistributed.java 2b41912 
>   exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestNestedLoopJoin.java
PRE-CREATION 
>   exec/java-exec/src/test/resources/queries/tpch/11_1.sql PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/33523/diff/
> 
> 
> Testing
> -------
> 
> Added tests for NOT-IN, EXISTS, Inequality.  Enabled TPCH-16, TPCH-15 and a slight variant
of TPCH-11.
> Ran all unit tests. One test: TestDisabledFunctionality.testSubqueryWithoutCorrelatedJoinCondition
encounters an error in NLJ execution which will be fixed shortly. This test is marked ignored
for now. 
> 
> Full functional/TPCH-100 tests are in progress.
> 
> 
> Thanks,
> 
> Aman Sinha
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message