impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through
Date Thu, 02 Feb 2017 01:12:39 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through
......................................................................


Patch Set 5:

(42 comments)

http://gerrit.cloudera.org:8080/#/c/5816/5//COMMIT_MSG
Commit Message:

Line 14: Testing:
Would be nice to get some idea of the perf improvement. The JIRA has an interesting query.
A small microbenchmark would also be useful.


http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:

Line 102
There was a specific reason why Open() was called here. There is an expectation that GetNext()
returns quickly after Open() is called. This expectation has to do with our client/server
interaction and the query state transitions. The query goes into the FINISHED state once Open()
succeeded on the coordinator fragment.

Does test_rows_availability.py succeed?


Line 62:   pass_through_children_ = tnode.union_node.pass_through;
move to initializer list in cosntructor


Line 122:   if (child_eos_ && child_idx_ > 0 && !IsInSubplan()) child(child_idx_
- 1)->Close(state);
Needs comment


Line 124:   if (child_idx_ < children_.size() && isPassThrough(child_idx_)) {
High-level comment what is happening here (passthrough).


Line 140:     if (child_eos_) {
Add a comment why it's not ok to Close() the child in this passthrough mode even if the child
is at eos.

In the meterialization case below, we can and do close the child as soon as possible.


Line 141:       row_batch->MarkNeedsDeepCopy();
Needs comment


Line 150:   if (child_idx_ < children_.size() ||
Might as well reverse this check and move it up, set eos to true and return OK. Easier to
see that we're just going to skip the remaining code.


http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/exec/union-node.h
File be/src/exec/union-node.h:

Line 36: /// Node that merges the results of its children by materializing their
update comment


Line 79:   std::vector<bool> pass_through_children_;
is_child_passthrough_?

The existing variable name makes it sound like it is a list of children that are pass through,
but there is actually an entry for all children.

Move this member out of the "Members that need to be reset()" section.


Line 100:   inline bool isPassThrough(int idx) {
idx -> child_idx

const method


Line 101:     DCHECK(idx < pass_through_children_.size());
DCHECK_LT


http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:

Line 132:   if (this->type() != other_desc.type()) return false;
can get rid of this->


http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:

Line 554:   /// Comparison is done by the contents of the tuple descriptors and not the ids.
I'd prefer to preserve the meaning of these existing functions (IsPrefixOf() and Equals().
We have several interesting DCHECKs that require the ids (and not just the physical layout)
to be identical.

If you really need these functions we should give them new names kind of like we have in the
FE for this check.


http://gerrit.cloudera.org:8080/#/c/5816/5/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

Line 433:   // List of booleans that indicates which children can be passed through
Remove "List of booleans" since that's redundant


Line 435:   4: required list<bool> pass_through
is_child_passthrough (good to keep names consistent)


http://gerrit.cloudera.org:8080/#/c/5816/5/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java:

Line 222: 
Would be good to keep the name of this function and the BE equivalent the same.


Line 223:   public boolean hasEqualPhysicalLayout(SlotDescriptor other) {
needs brief comment


Line 224:     if (!this.getType().matchesType(other.getType())) return false;
shouldn't the types be equal()?


Line 226:     if (this.getByteSize() != other.getByteSize()) return false;
can remove 'this'


http://gerrit.cloudera.org:8080/#/c/5816/5/fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
File fe/src/main/java/org/apache/impala/analysis/UnionStmt.java:

Line 155:   // List of output expressions of the Union node. This should be the same as result
List of output expressions produced by the union without the ORDER BY portion (if any). Same
as resultExprs_ if there is no ORDER BY.


Line 156:   // resultExprs_ if the UnionStmt does not have an Order By. Otherwise resultExprs_
Let's avoid referring to specific plan nodes at this stage and instead try to describe 'semantically'
what these exprs contain.


Line 158:   private List<Expr> unionNodeResultExprs_ = Lists.newArrayList();
unionResultExprs_


Line 191:     for (Expr e: other.unionNodeResultExprs_) unionNodeResultExprs_.add(e.clone());
use Expr.cloneList()


Line 501:       for (UnionOperand op: operands_) {
combine with the loop over operands_ in L526


http://gerrit.cloudera.org:8080/#/c/5816/5/fe/src/main/java/org/apache/impala/planner/UnionNode.java
File fe/src/main/java/org/apache/impala/planner/UnionNode.java:

Line 46:  * the corresponding result exprs into a new tuple.
update comment to reflect passthrough capability


Line 56:   // List of output expressions of the Union node.
List of union result exprs of the originating UnionStmt. Used for determining passthrough-compatibility
of children.

No need to new this list.

final

Move this above resultExprLists_ to hopefully minimize confusion by visual separation.


Line 69:   protected boolean passThroughEnabled = true;
Seems clearer to make this isInSubplan_ or something and pass that in the constructor as well.
The class comment should describe the passthrough capability and why we don't want it inside
a subplan.


Line 72:   protected List<Boolean> passThrough_ = Lists.newArrayList();
isChildPassthrough (at least something that is consistent across FE/BE and thrift)


Line 76:   protected UnionNode(PlanNodeId id, TupleId tupleId) {
remove?


Line 263:     TupleDescriptor this_tuple_desc = analyzer.getDescTbl().getTupleDesc(tupleId_);
use FE camel-case style


Line 281:     msg.union_node = new TUnionNode(
add Preconditions check that asserts the correct size of passThrough


Line 299:     if (!passThrough_.isEmpty()) {
Isn't this always non-empty?


Line 300:       List<String> passThroughNodes = Lists.newArrayList();
passThroughNodeIds


Line 308:             Joiner.on(", ").join(passThroughNodes) + "\n");
nit: we usually don't add a spaces after the comma for explain plan stuff


Line 309:       }
might be nice to produce "all" instead of listing all plan node ids if all children are passthrough


http://gerrit.cloudera.org:8080/#/c/5816/5/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
File testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test:

Line 187: |  pass through nodes: 01, 02
pass-through-operands:

(to be consistent with our existing "constant-operands")


http://gerrit.cloudera.org:8080/#/c/5816/5/testdata/workloads/functional-planner/queries/PlannerTest/empty.test
File testdata/workloads/functional-planner/queries/PlannerTest/empty.test:

Line 506: # IMPALA-3586: Verify that Union pass through is disabled in subplans.
Might be good to add a separate test for this, since this one is kind of weird for it (unions
with only a single operand)


http://gerrit.cloudera.org:8080/#/c/5816/5/testdata/workloads/functional-planner/queries/PlannerTest/union.test
File testdata/workloads/functional-planner/queries/PlannerTest/union.test:

Line 3085: |     partitions=4/4 files=4 size=460B
Add a new Kudu planner test (kudu.test) for:
select * from functional.alltypes
union all
select * from functional_kudu.alltypes

The operand with the Kudu scan cannot be passed through.

However if both operands are Kudu scans, then they can be passed through.


http://gerrit.cloudera.org:8080/#/c/5816/5/testdata/workloads/functional-query/queries/QueryTest/union.test
File testdata/workloads/functional-query/queries/QueryTest/union.test:

Line 1069: # IMPALA-3586: This query caused an issue because the tuple size of the children
No need to describe the failure mode of that specific bug you hit during development.

Better to describe what case this test is covering: Input tuples that have non-nullable slots.

I believe that this should now do passthrough right?


Line 1082: # IMPALA-3586: Test the case where no nodes are passed though.
Is this not already covered?


Line 1093: # IMPALA-3586: Test the case where 1 node is passed though, and one is not.
Not already covered?


-- 
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message