impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taras Bobrovytsky (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Date Wed, 22 Feb 2017 23:13:36 GMT
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-3586: Implement union passthrough

Patch Set 7:

File be/src/exec/

Line 145:   DCHECK(child(0)->row_desc().IsPrefixOfEquivalent(row_desc()));
> This should be IsPrefixOf() because we sanity checking the row descriptors 
File be/src/exec/

Line 115:   // passthrough case, the child can be closed right away because the row batch
> the child can be closed right away -> the child was already closed?

Line 116:   // from the child is copied (more details below).
> accuracy: the row batch wasn't copied

Line 121:   if (child_idx_ < children_.size() && isChildPassThrough(child_idx_))
> Suggest comment:

Line 122:     // If the rows from the current child can be passed through, the physical row
> This comment doesn't seem to add anything, I suggest removing it.
Replaced this with your suggestion.

Line 131:     // It may be possible that the row batch is not empty, so we save the previous
> More details would be helpful. If the batch has rows at this point, I'd ima
Added a dcheck instead. Time made this suggestion in patch 4.

Line 148:       // 'needs_deep_copy' lets us safely close the child in the next GetNext()
> style: 'needs_deep_copy' is not a visible variable here, suggest just sayin

Line 154: 
> DCHECK that child_idx_ is not passthrough here
File be/src/exec/union-node.h:

Line 67:   // Which children can be passed through, without being materialized.
> ... without evaluating and materializing their exprs.
File be/src/runtime/descriptors.h:

Line 412:   /// Return true if the physical layout of this descriptor matches the physical
> matches that of other_desc

Line 413:   /// of other_desc, but not necessarily ids.
> bot not necessarily the id.

Line 565:   /// of other_desc, but not necessarily ids.
> but not necessarily the ids
File be/src/service/query-options.h:

Line 38:       TImpalaQueryOptions::DISABLE_UNION_PASSTHROUGH + 1);\
> I tend to prefer ENABLE_UNION_PASSTHROUGH. To me positive phrasing is a lit
We have both positive and negative like DISABLE_CODEGEN and ENABLE_EXPR_REWRITES. I agree
that ENABLE is simpler and easier to think about. (We should rename all DISABLE options.)
File fe/src/main/java/org/apache/impala/analysis/

Line 227:   public boolean isEquivalent(SlotDescriptor other) {
> Unfortunately, the term 'equivalent' already has a different meaning in the
File fe/src/main/java/org/apache/impala/analysis/

Line 616:   public List<Expr> getUnionNodeResultExprs() {
> getUnionResultExprs()
File fe/src/main/java/org/apache/impala/planner/

Line 47:  * a child has an identical tuple layout as the output of the union node, the
> ... as the output of the union node, and the child only has naked SlotRefs 

Line 57:   protected final List<Expr> resultExprs_;
> unionResultExprs_ to make distinguish it from the resultExprLists_ and such

Line 73:   // If false, no batches from child nodes would be passed through in the backend.
> Comment should describe what this flag is. Also you mean "true" right?

Line 76:   // Indicates for which child nodes batches can be passed through in the backend.
> remove "in the backend" (it's clear that execution happens there)

Line 81:   protected UnionNode(PlanNodeId id, TupleId tupleId) {
> Is this c'tor still needed? If not, remove.
Yes, it's used if we are creating a constant node. (with no children)

Line 89:                       List<Expr> resultExprs, boolean isInSubplan) {
> indentation, unionResultExprs

Line 182:    * Compute whether we can pass through rows without materializing for the given
> Can combine/simplify like this:
Done. I don't think it's necessary to describe the passthrough conditions here. The implementation
makes it clear.

Line 189:       Analyzer analyzer, List<TupleId> childTupleIds, List<Expr> childExprList)
> childExprList -> childResultExprs

Line 190:     if (analyzer.getQueryOptions().isDisable_union_passthrough()) return false;
> seems clearer to move this into init() and not execute any of the passthrou
We need to construct a list of booleans to indicate if the child can be passed through. We
would have to then construct a list of false in init() if passthrough is disabled. I think
it's simpler if we keep it the way it is.

Line 193:     // TODO: Remove this as part of IMPALA-4179.
> Move TODO to the class comment
This TODO seems a little out of place in the class comment. Won't we have to give additional
information there for this comment to make sense.

Line 194:     if (isInSubplan_) return false;
> same here, seems easier to move this check into init()
Same as above. It's weird to special case adding a false to the list.

Line 205:     // Verify that the union node has one slot for every expression.
> union node -> union tuple descriptor

Line 212:     if (resultExprs_.size() != childTupleDescriptor.getSlots().size()) return false;
> I don't think this tricky check is correct because it won't allow passthrou
Created a JIRA for handling advanced passthrough cases.

Line 218:       SlotRef unionSlot = resultExprs_.get(i).unwrapSlotRef(false);
> unionSlotRef, childSlotRef

Line 220:       if (!unionTupleDescriptor.getSlots().get(i).isMaterialized()) continue;
> move above the unwrapSlotRef() calls

Line 221:       Preconditions.checkState(unionSlot.getDesc().getParent().getId().equals(tupleId_));
> Don't think we need this check, but something like this would be good:

Line 223:         Preconditions.checkState(!(childSlot instanceof SlotRef));
> No need for this check

Line 262:     // Compute which nodes can be passed through.
> which child nodes

Line 266:       // Check that if the child outputs a single tuple, then it's not nullable.
> move into computePassThrough
File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test:

Line 329: # IMPALA-3586: The operand with the Kudu scan cannot be passed through because it's
> because id is not-nullable (primary key)

Line 346: select id from functional_kudu.alltypes
> do select *
With select *, passthrough doesn't get enabled. The layout of the union tuple is different
that the layout of the child tuples.
File testdata/workloads/functional-planner/queries/PlannerTest/union.test:

Line 3104: # IMPALA-3678: Both union operands should produce rows with non-nullable slots
> remove "should"

Line 3124: # IMPALA-3678: The Union operands that contain a join should not be passed through,
> nice

Line 3184: select COUNT(c.c_custkey), COUNT(v.tot_price)
> lowercase count for consistency
File testdata/workloads/functional-query/queries/QueryTest/union.test:

Line 1126: select COUNT(c.c_custkey), COUNT(v.tot_price)
> lowercase count for consistency

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Taras Bobrovytsky <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message