impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Date Wed, 15 Feb 2017 23:42:40 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-3586: Implement union passthrough
......................................................................


Patch Set 7:

(40 comments)

http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/exec/analytic-eval-node.cc
File be/src/exec/analytic-eval-node.cc:

Line 145:   DCHECK(child(0)->row_desc().IsPrefixOfEquivalent(row_desc()));
This should be IsPrefixOf() because we sanity checking the row descriptors of exec nodes (and
not row batches).


http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:

Line 115:   // passthrough case, the child can be closed right away because the row batch
received
the child can be closed right away -> the child was already closed?


Line 116:   // from the child is copied (more details below).
accuracy: the row batch wasn't copied


Line 121:   if (child_idx_ < children_.size() && isChildPassThrough(child_idx_))
{
Suggest comment:
// Handle passthrough children. We pass 'row_batch' directly into the GetNext() call on the
child.


Line 122:     // If the rows from the current child can be passed through, the physical row
layout
This comment doesn't seem to add anything, I suggest removing it.


Line 131:     // It may be possible that the row batch is not empty, so we save the previous
value.
More details would be helpful. If the batch has rows at this point, I'd imagine it can cause
all sorts of other problems. How can the batch already have rows?


Line 148:       // 'needs_deep_copy' lets us safely close the child in the next GetNext()
call.
style: 'needs_deep_copy' is not a visible variable here, suggest just saying "Marking the
batch as needing a deep copy let's us ...


Line 154: 
DCHECK that child_idx_ is not passthrough here


http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/exec/union-node.h
File be/src/exec/union-node.h:

Line 67:   // Which children can be passed through, without being materialized.
... without evaluating and materializing their exprs.


http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:

Line 412:   /// Return true if the physical layout of this descriptor matches the physical
layout
matches that of other_desc


Line 413:   /// of other_desc, but not necessarily ids.
bot not necessarily the id.


Line 565:   /// of other_desc, but not necessarily ids.
but not necessarily the ids


http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/service/query-options.h
File be/src/service/query-options.h:

Line 38:       TImpalaQueryOptions::DISABLE_UNION_PASSTHROUGH + 1);\
I tend to prefer ENABLE_UNION_PASSTHROUGH. To me positive phrasing is a little easier to understand.
What do you think?


http://gerrit.cloudera.org:8080/#/c/5816/7/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java:

Line 227:   public boolean isEquivalent(SlotDescriptor other) {
Unfortunately, the term 'equivalent' already has a different meaning in the FE for slots,
so it would be good to the existing term fro this new one. Maybe isLayoutEquivalent()?


http://gerrit.cloudera.org:8080/#/c/5816/7/fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
File fe/src/main/java/org/apache/impala/analysis/UnionStmt.java:

Line 616:   public List<Expr> getUnionNodeResultExprs() {
getUnionResultExprs()


http://gerrit.cloudera.org:8080/#/c/5816/7/fe/src/main/java/org/apache/impala/planner/UnionNode.java
File fe/src/main/java/org/apache/impala/planner/UnionNode.java:

Line 47:  * a child has an identical tuple layout as the output of the union node, the
... as the output of the union node, and the child only has naked SlotRefs as result exprs,
then the child is marked as 'passthrough'. The rows of passthrough children are directly returned
by the union node, instead of materializing the child's result exprs into new tuples.


Line 57:   protected final List<Expr> resultExprs_;
unionResultExprs_ to make distinguish it from the resultExprLists_ and such


Line 73:   // If false, no batches from child nodes would be passed through in the backend.
Comment should describe what this flag is. Also you mean "true" right?


Line 76:   // Indicates for which child nodes batches can be passed through in the backend.
remove "in the backend" (it's clear that execution happens there)


Line 81:   protected UnionNode(PlanNodeId id, TupleId tupleId) {
Is this c'tor still needed? If not, remove.


Line 89:                       List<Expr> resultExprs, boolean isInSubplan) {
indentation, unionResultExprs


Line 182:    * Compute whether we can pass through rows without materializing for the given
child.
Can combine/simplify like this:

Returns true if rows from the child with 'childTupleIds' and 'childResultExprs' can be returned
directly by the union node (without materialization into a new tuple).

Might be good to list the conditions for passthrough compatibility.


Line 189:       Analyzer analyzer, List<TupleId> childTupleIds, List<Expr> childExprList)
{
childExprList -> childResultExprs


Line 190:     if (analyzer.getQueryOptions().isDisable_union_passthrough()) return false;
seems clearer to move this into init() and not execute any of the passthrough code


Line 193:     // TODO: Remove this as part of IMPALA-4179.
Move TODO to the class comment


Line 194:     if (isInSubplan_) return false;
same here, seems easier to move this check into init()


Line 205:     // Verify that the union node has one slot for every expression.
union node -> union tuple descriptor


Line 212:     if (resultExprs_.size() != childTupleDescriptor.getSlots().size()) return false;
I don't think this tricky check is correct because it won't allow passthrough for something
like:

select int_col, int_col, int_col from functional.alltypes
union all
select int_col, int_col, int_col from functional.alltypes


Line 218:       SlotRef unionSlot = resultExprs_.get(i).unwrapSlotRef(false);
unionSlotRef, childSlotRef


Line 220:       if (!unionTupleDescriptor.getSlots().get(i).isMaterialized()) continue;
move above the unwrapSlotRef() calls


Line 221:       Preconditions.checkState(unionSlot.getDesc().getParent().getId().equals(tupleId_));
Don't think we need this check, but something like this would be good:

Preconditions.checkStateNotNull(unionSlotRef);


Line 223:         Preconditions.checkState(!(childSlot instanceof SlotRef));
No need for this check


Line 262:     // Compute which nodes can be passed through.
which child nodes


Line 266:       // Check that if the child outputs a single tuple, then it's not nullable.
Tuple
move into computePassThrough


http://gerrit.cloudera.org:8080/#/c/5816/7/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test:

Line 329: # IMPALA-3586: The operand with the Kudu scan cannot be passed through because it's
not
because id is not-nullable (primary key)


Line 346: select id from functional_kudu.alltypes
do select *


http://gerrit.cloudera.org:8080/#/c/5816/7/testdata/workloads/functional-planner/queries/PlannerTest/union.test
File testdata/workloads/functional-planner/queries/PlannerTest/union.test:

Line 3104: # IMPALA-3678: Both union operands should produce rows with non-nullable slots
which
remove "should"


Line 3124: # IMPALA-3678: The Union operands that contain a join should not be passed through,
nice


Line 3184: select COUNT(c.c_custkey), COUNT(v.tot_price)
lowercase count for consistency


http://gerrit.cloudera.org:8080/#/c/5816/7/testdata/workloads/functional-query/queries/QueryTest/union.test
File testdata/workloads/functional-query/queries/QueryTest/union.test:

Line 1126: select COUNT(c.c_custkey), COUNT(v.tot_price)
lowercase count for consistency


-- 
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message