drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha" <asi...@maprtech.com>
Subject Re: Review Request 34006: DRILL-2958: Move Drill to alternative cost-based planner for Join planning
Date Sat, 09 May 2015 05:57:46 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34006/#review83134
-----------------------------------------------------------



exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillJoinRelBase.java
<https://reviews.apache.org/r/34006/#comment134006>

    Since the main difference between this and other costing function is the multiplication
factor, can you consolidate the two and just provide thin wrappers ?



exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdDistinctRowCount.java
<https://reviews.apache.org/r/34006/#comment134007>

    It is not clear why Scan should have a 'distinct' row count of 10% ?  What does 'distinct'
row count mean for scan ? (since we are not considering filter push-down or partition pruning
here).



exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRelFactories.java
<https://reviews.apache.org/r/34006/#comment134008>

    This currently has factories for creating logical project, filter and join.  Why only
these 3 and not other logical rels ?



exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java
<https://reviews.apache.org/r/34006/#comment134009>

    Remove these commented rules.



exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java
<https://reviews.apache.org/r/34006/#comment134010>

    Is the criteria for using Lopt optimizer (in terms of number of tables above a certain
threshold) applied internally ? We should have a Drill specific setting for it beyond just
a true/false setting.


- Aman Sinha


On May 9, 2015, 12:20 a.m., Jinfeng Ni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34006/
> -----------------------------------------------------------
> 
> (Updated May 9, 2015, 12:20 a.m.)
> 
> 
> Review request for drill and Aman Sinha.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> Drill current use VolcanoPlanner in join planning. This planner has two known issues:
> 
> 1. The search space is increased exponentially with increased # of tables joined. If
query has more than > 10 tables join, the planning time itself could be minutes, if not
longer.
> 
> 2. Drill did not enable a rule to swap both sides of join, due to the search space problem.
We only do a swap join afterwards. See DRILL-2236. This means the join order chosen by Drill's
VolcanoPlanner might not be optimal.
> 
> To address the above two issues, we are going to provide another planner for the purpose
of join ordering planning. This planner will use a different optimization rules, and the search
space is not increased exponentially with # of table. 
> 
> The main logic of this new planner:
> 1) Let VolcanoPlanner do all the rule transformations same as the current planner's logical
planning, except for the join permutation rule.
> 2) After that, pass to HepPlanner with Calcite LOPT optimization rule, to let it do the
join ordering. Feed with the HepPlanner with Drill's RelMetaDataProvider, to leverage the
statistics (rowcount) available in Drill's table/files. 
> 3) Continue with the same physical planning as before.
> 
> With the limited statistics available in Drill, the new planner seems to produce better
query plan than the current, for several TPCH queries. 
> 
> Preliminary performance results show this planner run faster than the existing one, and
the join plan seems to be same or better than the plan chosen by the existing planner. 
> 
> Will update more in detail about the comparison.
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillJoinRelBase.java
5ab416c 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillProjectRelBase.java
42ef6ac 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillDefaultRelMetadataProvider.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdDistinctRowCount.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillFilterRel.java
dbd08f4 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillJoinRel.java
dcccdb0 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillProjectRel.java
6e132aa 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushProjIntoScan.java
2981de8 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRelFactories.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java
53e1bff 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java
7d8dd97 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlWorker.java
3c78c08 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java
eda1b5f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
4d8b034 
> 
> Diff: https://reviews.apache.org/r/34006/diff/
> 
> 
> Testing
> -------
> 
> Unit test / Regression suite.
> 
> 
> Thanks,
> 
> Jinfeng Ni
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message