impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taras Bobrovytsky (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Date Tue, 21 Mar 2017 02:29:11 GMT
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-3586: Implement union passthrough
......................................................................


Patch Set 19:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:

Line 111:   DCHECK_LT(child_idx_, children_.size());
> this DCHECK should come before any use of child_idx_ since if it's violated
Done


Line 114:     DCHECK(child(child_idx_)->row_desc().LayoutEquals(row_batch->row_desc()));
> what do these dchecks have to do with child_eos_? i.e. shouldn't they true 
Yes, They should be true. I think the idea was to call this once per child. I actually think
it's safer to call this every time because currently it does not get called for the first
child (which gets opened in Open()).

Changed it so that it gets called every time.


PS19, Line 132: child_idx_++
> ++child_idx_ per our style.
Done


PS19, Line 147: child_idx_ < children_.size()
> Would make more sense as HasMoreMaterialized()
Done. Even though this does an extra unnecessary check (first_materialized_child_idx_ <=
child_idx_), this is cleaner.


PS19, Line 150: children
> children that need materialization
Done


PS19, Line 153: Row
> Child row batch
Done


Line 168:       // There are only 3 ways of getting out of this loop:
> it would be helpful to concisely say what this loop is responsible for give
Done


Line 193:         COUNTER_SET(rows_returned_counter_, num_rows_returned_);
> should we do: child_batch_.reset() here? we shouldn't ever reference it aga
I see. So the invariant should be if child_batch references something (it's not reset), then
it corresponds to an open materialized child.

Added reset() here. By the way, reset() will be called twice, once here and once in Close().


PS19, Line 200:     // We end up here iff one of the following is true (or both).
              :     // 1. We are done consuming all batches from the current child and we
need to move on
              :     //    to the next child.
              :     // 2. The output row batch is at capacity.
> this could be summarized for a quicker read:
Changed the structure along the lines of what you suggested, and cleaned this up.


PS19, Line 204:  In other words, the only way to not end up here if we entered the outer loop
is if
              :     // the limit is reached.
> rather than say that in a comment, how about:
Done


Line 263:       (!HasMorePassthrough() && !HasMoreMaterialized() && !HasMoreConst(state));
> shouldn't we put lines 254-263 inside a do-while loop, so that we don't ret
Unfortunately that wouldn't work.

For passthrough, we simply forward as they are. Even if they are partially filled, we just
forward them. That's kind of the point of this patch.

Maybe we could put materialized and const in a while loop, but that would make the code messier
and the benefit would be minimal (up to 1 fewer partially filled row batch per union node
per entire cluster).


http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exec/union-node.h
File be/src/exec/union-node.h:

PS19, Line 115: child_idx
> 'child_idx'
Done


Line 129:     return child_idx_ >= first_materialized_child_idx_ && child_idx_
< children_.size();
> this suggestion is okay to ignore, but i find these kind of conditions easi
Reordered it as you suggested. (By the way, in Python you can actually write "w < x <=
y < z")

first_materialized_child_idx_ points to the first materialized child. So everything after
it is materialized. that means if child_idx_ is greater or equal to materialized_child_idx_
then it's materialized.

It's the exact opposite of passthrough (where we check if child_idx_ is less than).


http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exprs/slot-ref.cc
File be/src/exprs/slot-ref.cc:

Line 94:     DCHECK(false);
> this is confusing. either it's an invariant or it's not. it's even more con
Removed.


-- 
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message