drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jinfeng Ni" <...@maprtech.com>
Subject Re: Review Request 34006: DRILL-2958: Move Drill to alternative cost-based planner for Join planning
Date Sat, 09 May 2015 15:13:49 GMT


> On May 8, 2015, 10:57 p.m., Aman Sinha wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java,
line 529
> > <https://reviews.apache.org/r/34006/diff/2/?file=954207#file954207line529>
> >
> >     Is the criteria for using Lopt optimizer (in terms of number of tables above
a certain threshold) applied internally ? We should have a Drill specific setting for it beyond
just a true/false setting.
> 
> Jinfeng Ni wrote:
>     I think it probably makes sense to avoid LOPT planner for single table query. For
any query with JOIN, since the current planer does not enable SwapJoin rule in the logical
planning phase, it may not find the optimal plan, and rely on a post-planing method to swap
join based on rowcount. In that sense, I feel LOPT planner might be a better choice even for
2 or 3 tables join.
>     
>     For single table query, since there is no join, it seems no difference between LOPT
/ the current planner. That's why I did not add a threashold.

Another reason that I did not add a threashold option is we have added a new option to swtich
between the existing planner and new planner. User could simply turn on/off that option, to
completely switch to one of them. Having both the switch option and threashold option probably
will cause more confusing, IMHO.


- Jinfeng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34006/#review83134
-----------------------------------------------------------


On May 8, 2015, 5:20 p.m., Jinfeng Ni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34006/
> -----------------------------------------------------------
> 
> (Updated May 8, 2015, 5:20 p.m.)
> 
> 
> Review request for drill and Aman Sinha.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> Drill current use VolcanoPlanner in join planning. This planner has two known issues:
> 
> 1. The search space is increased exponentially with increased # of tables joined. If
query has more than > 10 tables join, the planning time itself could be minutes, if not
longer.
> 
> 2. Drill did not enable a rule to swap both sides of join, due to the search space problem.
We only do a swap join afterwards. See DRILL-2236. This means the join order chosen by Drill's
VolcanoPlanner might not be optimal.
> 
> To address the above two issues, we are going to provide another planner for the purpose
of join ordering planning. This planner will use a different optimization rules, and the search
space is not increased exponentially with # of table. 
> 
> The main logic of this new planner:
> 1) Let VolcanoPlanner do all the rule transformations same as the current planner's logical
planning, except for the join permutation rule.
> 2) After that, pass to HepPlanner with Calcite LOPT optimization rule, to let it do the
join ordering. Feed with the HepPlanner with Drill's RelMetaDataProvider, to leverage the
statistics (rowcount) available in Drill's table/files. 
> 3) Continue with the same physical planning as before.
> 
> With the limited statistics available in Drill, the new planner seems to produce better
query plan than the current, for several TPCH queries. 
> 
> Preliminary performance results show this planner run faster than the existing one, and
the join plan seems to be same or better than the plan chosen by the existing planner. 
> 
> Will update more in detail about the comparison.
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillJoinRelBase.java
5ab416c 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillProjectRelBase.java
42ef6ac 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillDefaultRelMetadataProvider.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdDistinctRowCount.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillFilterRel.java
dbd08f4 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillJoinRel.java
dcccdb0 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillProjectRel.java
6e132aa 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushProjIntoScan.java
2981de8 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRelFactories.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java
53e1bff 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java
7d8dd97 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlWorker.java
3c78c08 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java
eda1b5f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
4d8b034 
> 
> Diff: https://reviews.apache.org/r/34006/diff/
> 
> 
> Testing
> -------
> 
> Unit test / Regression suite.
> 
> 
> Thanks,
> 
> Jinfeng Ni
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message