impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Date Tue, 14 Mar 2017 18:47:51 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-3586: Implement union passthrough
......................................................................


Patch Set 14:

(13 comments)

The new code is much clearer! I think we can still improve it further though. Happy to go
over the subtleties in person if you prefer.

http://gerrit.cloudera.org:8080/#/c/5816/14/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:

Line 138:   while (const_expr_list_idx_ < const_expr_lists_.size() && !row_batch->AtCapacity()
&& !ReachedLimit()) {
long line


Line 144:   if (const_expr_list_idx_ == const_expr_lists_.size()) *eos = true;
*eos = const_expr_list_idx_ == const_expr_lists_.size();


Line 196:     if (child_batch_.get() == NULL) {
nullptr


Line 210:       // There are only 3 ways of getting out of this loop:
We also break out if we fetch an empty child batch


Line 214:       RETURN_IF_ERROR(QueryMaintenance(state));
remove


Line 274:   if (const_todo_) {
Not a big deal, but I think we should do the const exprs last because most of the time this
branch will not be taken.


Line 275:     RETURN_IF_ERROR(GetNextConst(state, row_batch, &done));
why not pass &const_todo_ directly, and same for the other cases below


Line 278:     RETURN_IF_ERROR(GetNextPassThrough(state, row_batch, &done));
It might be simpler (== less error prone) overall to order the operands based on passthrough,
or create two lists of child indexes (passthrough and non-passthrough), populated in Init().


Line 282:       child_idx_ = 0;
The code here and the setting of child_idx_ in Open() is kind of subtle. There's a baked in
assumption that passthrough is done before materialized, but the code that makes that work
is spread in different places. I think using two separate child lists, or ordering the operands
would help clarify this.

I'm ok with ordering or separating, but I do have a slight preference for ordering because
then the evaluation order is clear even at the plan level. Otherwise, you need to understand
the union implementation in detail to know how it gets evaluated.


Line 286:     RETURN_IF_ERROR(GetNextMaterialized(state, row_batch, &done));
add a DCHECK here that child_eos_ is true if there were any passthrough children


Line 290:   *eos = ReachedLimit() || (!const_todo_ && !passthrough_todo_ &&
!materialize_todo_);
Isn't the last condition the same as !materialized_todo_ since we are going over the cases
strictly in order?


http://gerrit.cloudera.org:8080/#/c/5816/14/be/src/exec/union-node.h
File be/src/exec/union-node.h:

Line 39: /// and expressions don't need to be evaluated. The UnionNode pulls row batches
Comment says union goes through children sequentially, which makes to maintain imo.


Line 107:   Status GetNextConst(RuntimeState* state, RowBatch* row_batch, bool* eos);
Use a different name then eos, e.g. 'done' to avoid confusion with the real eos.

We might not need the param at all if we order the children or separate the child lists into
passthrough and non-passthrough


-- 
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message