impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4236: Codegen CopyRows() for select nodes
Date Mon, 09 Oct 2017 18:56:37 GMT
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8196 )

Change subject: IMPALA-4236: Codegen CopyRows() for select nodes
......................................................................


Patch Set 6:

(2 comments)

Your code looks good but I'd really like to simplify this code further so that it's easier
to understand in future.

http://gerrit.cloudera.org:8080/#/c/8196/6/be/src/exec/select-node.cc
File be/src/exec/select-node.cc:

http://gerrit.cloudera.org:8080/#/c/8196/6/be/src/exec/select-node.cc@91
PS6, Line 91:   if (ReachedLimit() || (child_row_idx_ == child_row_batch_->num_rows() &&
child_eos_)) {
This is still a bit wonky (not your change but we should try to leave this in a good state)
- it's not clear when this will actually be taken. 


The invariant around GetNext() is that callers shouldn't call GetNext() after 'eos' is true.
Given that invariant, I don't think it's possible that this is taken.

* ReachedLimit() should never be true here. limit_ == 0 should be caught by the frontend and
converted into an EmptySetNode. If limit_ > 0, then we should have set *eos after copying
the rows below.
* If we hit the end of the input below on line 120, we should already have set *eos.


http://gerrit.cloudera.org:8080/#/c/8196/6/be/src/exec/select-node.cc@107
PS6, Line 107:       child_row_batch_->TransferResourceOwnership(row_batch);
Sorry for the additional churn but I think the resource transfer is overly complicated. This
actually will often attach the resources to the batch after the last batch that references
them. Instead it should attach resources to the batch as soon as the last row from the input
batch is copied to the output.

I'm thinking removing the transfer and AtCapacity() check here and changing the logic below
to something like:

  if (*eos || child_row_idx_ == child_row_batch_->num_rows()) {
    child_row_batch_->TransferResourceOwnership(row_batch);
    child_row_batch_->Reset();
  }
  if (*eos || row_batch->AtCapacity()) return Status::OK();

Or maybe it would then be clearer as a do/while() loop.

This would require some further changes, since it would reset child_row_batch_->num_rows()
to 0 - but maybe child_row_batch_->num_rows() == 0 should actually just signal that the
batch has been consumed.



-- 
To view, visit http://gerrit.cloudera.org:8080/8196
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e
Gerrit-Change-Number: 8196
Gerrit-PatchSet: 6
Gerrit-Owner: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Comment-Date: Mon, 09 Oct 2017 18:56:37 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message