impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through
Date Wed, 01 Feb 2017 23:37:20 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through
......................................................................


Patch Set 4:

(15 comments)

I think this is looking pretty good. It turns out to hit a lot of interesting corner cases
in the backend but I think we just need to make sure we've got them covered and tests for
them all.

http://gerrit.cloudera.org:8080/#/c/5816/4/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:

Line 122:   if (child_eos_ && child_idx_ > 0 && !IsInSubplan()) child(child_idx_
- 1)->Close(state);
Can you add a comment that this only applies to passthrough? E.g. "The previous child may
have been left open if passthrough was enabled for it". Otherwise it's hard to figure out
how it fits in with the rest of it.


Line 133:       row_batch->set_num_rows(limit_ - num_rows_returned_);
There's a corner case that breaks this calculation. The problem is that 'row_batch' may be
non-empty when GetNext() is called. E.g. the subplan node does this. I think we just need
to save the value of num_rows() before calling GetNext() and adjust the calculation accordingly.

I think we're missing a test case where we have a union node with a limit under a subplan.


Line 150:   if (child_idx_ < children_.size() ||
Is this just an optimisation? Might be best to remove it and keep the code simpler unless
we have data showing it's a bottleneck. If I understand it correctly it only helps on the
last GetNext() call.

If we keep it we should initialise tuple_buf to NULL so if we mess up it's more debuggable.


http://gerrit.cloudera.org:8080/#/c/5816/4/be/src/exec/union-node.h
File be/src/exec/union-node.h:

Line 99:   /// Returns true if the child can be passed through.
Nit: "if the child at 'idx'"


http://gerrit.cloudera.org:8080/#/c/5816/4/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:

Line 541:   /// Return true if the tuple ids of this descriptor match tuple ids of other desc.
This comment needs updating to reflect the new behaviour.


http://gerrit.cloudera.org:8080/#/c/5816/4/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java:

Line 228:     if (this.getNullIndicatorByte() != other.getNullIndicatorByte()) return false;
Also need to compare the NullIndicatorBit().

There are some subtle differences with how we compute the mem layout for Kudu tables can have
non-nullable slots, so I think there's an interesting test where we union the output of an
aggregate function like count() (where slots are non-nullable and don't get a null bit) and
a Kudu table with a non-nullable Kudu column, which gets a null bit regardless.


http://gerrit.cloudera.org:8080/#/c/5816/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

Line 1536:     if (ctx_.hasSubplan()) unionNode.disablePassthrough();
Add a TODO to remove this as part of IMPALA-4179. Otherwise I might forget.


http://gerrit.cloudera.org:8080/#/c/5816/4/fe/src/main/java/org/apache/impala/planner/UnionNode.java
File fe/src/main/java/org/apache/impala/planner/UnionNode.java:

Line 68:   // If false, no child nodes would be passed through in the backend.
nit: "batches from child nodes" to be a bit clearer


Line 71:   // Indicates which child nodes can be passed through in the backend.
nit: "batches from child nodes" to be a bit clearer


PS4, Line 176: /*
/**


Line 177:    * Compute the children for which rows can be forwarded by the Union node without
being
It's a little unclear what the input is.

Maybe "Compute whether we can pass through rows from a child whether 'childExprList' is evaluated
over a row with 'childTupleIds'"


Line 183:     // Pass through is only done for the simple case where the row has a single
tuple.
What's the motivation for this? Is it because the union output is always a row with a single
tuple?


Line 266:             analyzer, children_.get(i).getTupleIds(), resultExprLists_.get(i)));
I *think* in principle we may need to also check the nullable tuple IDs, since the output
tuple should be non-nullable but the input could be nullable in theory. In practice I don't
think it's possible for a row with 1 tuple but it would be better to be conservative.

Alex probably has more insight.


Line 299:     if (!passThrough_.isEmpty()) {
We probably don't want to print this at explain_level MINIMAL.


http://gerrit.cloudera.org:8080/#/c/5816/4/testdata/workloads/functional-planner/queries/PlannerTest/union.test
File testdata/workloads/functional-planner/queries/PlannerTest/union.test:

Line 3103: ====
Can you add a couple of tests where there is a table scan on one branch of the union and a
hash join on the other? I didn't see much test coverage of joins in unions. The table scan
should be passed through and the hash join shouldn't.

The idea is that hash join outputs a row with multiple tuples. With an inner join both are
non-nullable, with an outer join, the right tuple is nullable.

I think it would be good to have both planner and end-to-end tests along those lines.


-- 
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message