impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <>
Subject [Impala-CR](cdh5-trunk) MT: Planner for multi-threaded execution
Date Wed, 27 Apr 2016 01:40:46 GMT
Alex Behm has posted comments on this change.

Change subject: MT: Planner for multi-threaded execution

Patch Set 1:

Commit Message:

Line 18: section yet (which will happen at a later date, to cover more corner cases).
Mention that the changes are intentionally additive so as to not break the existing functionality,
and that eventually the parallel planning might be merged into single and distributed plan
File be/src/service/

Line 391:       case TImpalaQueryOptions::MT_NUM_CORES: {
How about NUM_THREADS? Reasons:

* more accurate (no guarantee that you'll actually get those many real cores)
* more consistent with NUM_NODES
* avoids potential confusion with V_CPU_CORES
File fe/src/main/java/com/cloudera/impala/common/

Line 27:   protected TreeNode() {}
if this is to prevent public construction, add comment

Line 41:   public void removeChildren() { children_.clear(); }
clearChildren() seems clearer
File fe/src/main/java/com/cloudera/impala/planner/

Line 1: // Copyright 2012 Cloudera Inc.
2016 and elsewhere
File fe/src/main/java/com/cloudera/impala/planner/

Line 617:       unionNode.removeChildren();
can be moved above this block (needs to be done in any case)

Line 622:       // TODO: provide a more rigid interface for this, this feels error-prone.
are we really going to address this TODO? if it's easy enough might as well do it now, or
alternatively remove it
File fe/src/main/java/com/cloudera/impala/planner/

Line 59:   public HashTableId getHashTableId() { return hashTableId_; }
Use JoinBuildId and move to JoinNode?

Line 146:       if (hashTableId_.isValid()) {
I'd prefer to only add this in VERBOSE and EXTENDED
File fe/src/main/java/com/cloudera/impala/planner/

Line 38: public class HashTableBuildNode extends PlanNode {
Why is this a PlanNode and not a DataSink?  Seems more logical since a join build does not
have an "output" in the GetNext() sense.

Might as well call this JoinBuildSink.

Line 45:    * Moves the build input of joinNode to this node (and connects this node
remove parenthesis, that part seems important

Line 69:     super.init(analyzer);
set cardinality and hosts, Preconditions check that there are no conjuncts
File fe/src/main/java/com/cloudera/impala/planner/

Line 64:   // TODO: change to a more generic abstraction, NLJs don't have hash tables
This TODO sounds easy enough to address now (just rename the new stuff)
File fe/src/main/java/com/cloudera/impala/planner/

Line 34:  * The plans that produce intermediate hash tables for a particular plan P are grouped
no need for the "P" (it's not used anywhere else)

Line 51:    * Given a distributed plan (expressed as the root of a tree of plan fragments),
suggest removing the part in parenthesis and renaming the "plan" param to "root"

Line 73:"createbuildplans fragment " + fragment.getId().toString());
let's not forget to eventually remove some of this logging

Line 101:     if (node instanceof JoinNode) {
we should also only traverse the left child of a SubplanNode
File fe/src/main/java/com/cloudera/impala/planner/

Line 78:   private CohortId cohortId_;
needs comment or explanation in class comment

Line 111:"new fragment " + id.toString());
File fe/src/main/java/com/cloudera/impala/planner/

Line 172:     return planner.createPlans(distrPlan.get(0));
squeeze this in here:

ctx_.getRootAnalyzer().getTimeline().markEvent("Parallel plan created")
File fe/src/test/java/com/cloudera/impala/planner/

Line 402:     testPlan(testCase,  Section.DISTRIBUTEDPLAN, queryCtx, errorLog, actualOutput);
extra space

Line 404:     if (!testCase.getSectionContents(Section.PARALLELPLANS).isEmpty()) {
The getSectionContents() check is already performed in testPlan() below.

Not checking that here has the advantage that we'll run all tests through the ParallelPlanner
even if we don't have an expected test section for it.
File testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test:

Line 334: |  |  plan id: 01
I'm not sure the plan id is really necessary. I'm ok leaving it but I'd suggest condensing
the explain output to something like:

plan-id=01 build-id=00 cohort-id=01

I also think we should print the plan-id for all nodes to that it becomes clearer what it
 actually is.

Line 351: |  |--32:BUILD [01]
How about JOIN BUILD? That seems meaningful and generic enough to also cover the nested loop
join case.

Line 356: |  |  |  tuple-ids=4 row-size=0B cardinality=unavailable
even though it's not needed, it would be nice to preserve the cardinality and per-host-mem

Line 5105: |  |  |  build expressions: 

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c34dd3f9190a131e6f03d901b4bfcd164a5174
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Marcel Kornacker <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-HasComments: Yes

View raw message