impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) MT: Planner for multi-threaded execution
Date Wed, 27 Apr 2016 01:40:46 GMT
Alex Behm has posted comments on this change.

Change subject: MT: Planner for multi-threaded execution
......................................................................


Patch Set 1:

(26 comments)

http://gerrit.cloudera.org:8080/#/c/2846/1//COMMIT_MSG
Commit Message:

Line 18: section yet (which will happen at a later date, to cover more corner cases).
Mention that the changes are intentionally additive so as to not break the existing functionality,
and that eventually the parallel planning might be merged into single and distributed plan
creation.


http://gerrit.cloudera.org:8080/#/c/2846/1/be/src/service/query-options.cc
File be/src/service/query-options.cc:

Line 391:       case TImpalaQueryOptions::MT_NUM_CORES: {
How about NUM_THREADS? Reasons:

* more accurate (no guarantee that you'll actually get those many real cores)
* more consistent with NUM_NODES
* avoids potential confusion with V_CPU_CORES


http://gerrit.cloudera.org:8080/#/c/2846/1/fe/src/main/java/com/cloudera/impala/common/TreeNode.java
File fe/src/main/java/com/cloudera/impala/common/TreeNode.java:

Line 27:   protected TreeNode() {}
if this is to prevent public construction, add comment


Line 41:   public void removeChildren() { children_.clear(); }
clearChildren() seems clearer


http://gerrit.cloudera.org:8080/#/c/2846/1/fe/src/main/java/com/cloudera/impala/planner/CohortId.java
File fe/src/main/java/com/cloudera/impala/planner/CohortId.java:

Line 1: // Copyright 2012 Cloudera Inc.
2016 and elsewhere


http://gerrit.cloudera.org:8080/#/c/2846/1/fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java
File fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java:

Line 617:       unionNode.removeChildren();
can be moved above this block (needs to be done in any case)


Line 622:       // TODO: provide a more rigid interface for this, this feels error-prone.
are we really going to address this TODO? if it's easy enough might as well do it now, or
alternatively remove it


http://gerrit.cloudera.org:8080/#/c/2846/1/fe/src/main/java/com/cloudera/impala/planner/HashJoinNode.java
File fe/src/main/java/com/cloudera/impala/planner/HashJoinNode.java:

Line 59:   public HashTableId getHashTableId() { return hashTableId_; }
Use JoinBuildId and move to JoinNode?


Line 146:       if (hashTableId_.isValid()) {
I'd prefer to only add this in VERBOSE and EXTENDED


http://gerrit.cloudera.org:8080/#/c/2846/1/fe/src/main/java/com/cloudera/impala/planner/HashTableBuildNode.java
File fe/src/main/java/com/cloudera/impala/planner/HashTableBuildNode.java:

Line 38: public class HashTableBuildNode extends PlanNode {
Why is this a PlanNode and not a DataSink?  Seems more logical since a join build does not
have an "output" in the GetNext() sense.

Might as well call this JoinBuildSink.


Line 45:    * Moves the build input of joinNode to this node (and connects this node
remove parenthesis, that part seems important


Line 69:     super.init(analyzer);
set cardinality and hosts, Preconditions check that there are no conjuncts


http://gerrit.cloudera.org:8080/#/c/2846/1/fe/src/main/java/com/cloudera/impala/planner/JoinNode.java
File fe/src/main/java/com/cloudera/impala/planner/JoinNode.java:

Line 64:   // TODO: change to a more generic abstraction, NLJs don't have hash tables
This TODO sounds easy enough to address now (just rename the new stuff)


http://gerrit.cloudera.org:8080/#/c/2846/1/fe/src/main/java/com/cloudera/impala/planner/ParallelPlanner.java
File fe/src/main/java/com/cloudera/impala/planner/ParallelPlanner.java:

Line 34:  * The plans that produce intermediate hash tables for a particular plan P are grouped
no need for the "P" (it's not used anywhere else)


Line 51:    * Given a distributed plan (expressed as the root of a tree of plan fragments),
suggest removing the part in parenthesis and renaming the "plan" param to "root"


Line 73:     LOG.info("createbuildplans fragment " + fragment.getId().toString());
let's not forget to eventually remove some of this logging


Line 101:     if (node instanceof JoinNode) {
we should also only traverse the left child of a SubplanNode


http://gerrit.cloudera.org:8080/#/c/2846/1/fe/src/main/java/com/cloudera/impala/planner/PlanFragment.java
File fe/src/main/java/com/cloudera/impala/planner/PlanFragment.java:

Line 78:   private CohortId cohortId_;
needs comment or explanation in class comment


Line 111:     LOG.info("new fragment " + id.toString());
remove


http://gerrit.cloudera.org:8080/#/c/2846/1/fe/src/main/java/com/cloudera/impala/planner/Planner.java
File fe/src/main/java/com/cloudera/impala/planner/Planner.java:

Line 172:     return planner.createPlans(distrPlan.get(0));
squeeze this in here:

ctx_.getRootAnalyzer().getTimeline().markEvent("Parallel plan created")


http://gerrit.cloudera.org:8080/#/c/2846/1/fe/src/test/java/com/cloudera/impala/planner/PlannerTestBase.java
File fe/src/test/java/com/cloudera/impala/planner/PlannerTestBase.java:

Line 402:     testPlan(testCase,  Section.DISTRIBUTEDPLAN, queryCtx, errorLog, actualOutput);
extra space


Line 404:     if (!testCase.getSectionContents(Section.PARALLELPLANS).isEmpty()) {
The getSectionContents() check is already performed in testPlan() below.

Not checking that here has the advantage that we'll run all tests through the ParallelPlanner
even if we don't have an expected test section for it.


http://gerrit.cloudera.org:8080/#/c/2846/1/testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
File testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test:

Line 334: |  |  plan id: 01
I'm not sure the plan id is really necessary. I'm ok leaving it but I'd suggest condensing
the explain output to something like:

plan-id=01 build-id=00 cohort-id=01

I also think we should print the plan-id for all nodes to that it becomes clearer what it
 actually is.


Line 351: |  |--32:BUILD [01]
How about JOIN BUILD? That seems meaningful and generic enough to also cover the nested loop
join case.


Line 356: |  |  |  tuple-ids=4 row-size=0B cardinality=unavailable
even though it's not needed, it would be nice to preserve the cardinality and per-host-mem


Line 5105: |  |  |  build expressions: 
whitespace


-- 
To view, visit http://gerrit.cloudera.org:8080/2846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c34dd3f9190a131e6f03d901b4bfcd164a5174
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message