impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Hecht (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Date Tue, 21 Mar 2017 00:26:23 GMT
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3586: Implement union passthrough
......................................................................


Patch Set 19:

(14 comments)

nice cleanup

http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:

Line 111:   DCHECK_LT(child_idx_, children_.size());
this DCHECK should come before any use of child_idx_ since if it's violated, it doesn't make
sense to call e.g. IsChildPassthrough() with chidl_idx_.


Line 114:     DCHECK(child(child_idx_)->row_desc().LayoutEquals(row_batch->row_desc()));
what do these dchecks have to do with child_eos_? i.e. shouldn't they true regardless of the
value of child_eos_?


PS19, Line 132: child_idx_++
++child_idx_ per our style.


PS19, Line 147: child_idx_ < children_.size()
Would make more sense as HasMoreMaterialized()


PS19, Line 150: children
children that need materialization


PS19, Line 153: Row
Child row batch


Line 168:       // There are only 3 ways of getting out of this loop:
it would be helpful to concisely say what this loop is responsible for given it's complexity.
 Something like:
This loop fetches row batches from a single child and materializes each output row, until
one of these conditions:
1) ...


Line 193:         COUNTER_SET(rows_returned_counter_, num_rows_returned_);
should we do: child_batch_.reset() here? we shouldn't ever reference it again, but seems cleaner
to keep the invariant that child_batch_ always corresponds to the open child (otherwise its
memory references aren't valid).


PS19, Line 200:     // We end up here iff one of the following is true (or both).
              :     // 1. We are done consuming all batches from the current child and we
need to move on
              :     //    to the next child.
              :     // 2. The output row batch is at capacity.
this could be summarized for a quicker read:

// Either we've finished the current child or the output batch is at capacity, or both.


PS19, Line 204:  In other words, the only way to not end up here if we entered the outer loop
is if
              :     // the limit is reached.
rather than say that in a comment, how about:

DCHECK(!ReachedLimit());


Line 263:       (!HasMorePassthrough() && !HasMoreMaterialized() && !HasMoreConst(state));
shouldn't we put lines 254-263 inside a do-while loop, so that we don't return partially filled
row-batches when we still have output to produce?  Returning partially filled row-batches
is legal but can be confusing to clients (admittedly it can happen for other reasons currently,
but it'd be nice to avoid doing that unless there's a good reason).

do {
...
while (!*eos && !row_batch->AtCapacity());


http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exec/union-node.h
File be/src/exec/union-node.h:

PS19, Line 115: child_idx
'child_idx'


Line 129:     return child_idx_ >= first_materialized_child_idx_ && child_idx_
< children_.size();
this suggestion is okay to ignore, but i find these kind of conditions easier to read as:

x <= y && y < z

so that it looks more close to how you'd read it in math: x <= y < z.

That said, I don't see how the condition:

first_materialized_child_idx_ <= child_idx_ < children_.size() 

is correct for whether there are more materialized children. i.e. when child_idx_ < first_materialized_child_idx,
we still have more materalized children, right?

So, shouldn't this be:

// We have children that need materialization and haven't processed them all yet.
first_materizlied_child_idx_ != children_.size() && child_idx_ < children_.size()


http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exprs/slot-ref.cc
File be/src/exprs/slot-ref.cc:

Line 94:     DCHECK(false);
this is confusing. either it's an invariant or it's not. it's even more confusing because
now it makes this case and the one at line 79 different for no good reason, and future code
readers won't know why.


-- 
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message