impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4236: Codegen CopyRows() for select nodes
Date Mon, 02 Oct 2017 23:40:05 GMT
Tim Armstrong has posted comments on this change. ( )

Change subject: IMPALA-4236: Codegen CopyRows() for select nodes

Patch Set 1:

Commit Message:
PS1, Line 8: 
Can you add a testing section? It would be good to mention what test coverage we have for
this code path (it's certainly possible that we have some gaps).
PS1, Line 9: Preformance
PS1, Line 10: perdicates
PS1, Line 11: [column_name > value AND]
Can you include a concrete example of the query to make it easier for others to reproduce?
PS1, Line 18: | Select Node  | 12s385ms   | 1m1s        | 234ms      | 797ms       |
File be/src/exec/
PS1, Line 29: child_row_batch_->num_rows()
This probably doesn't matter, but you could save a few instructions by hoisting this out of
the loop. E.g.

  int child_batch_num_rows = ...;

Pretty sure the compiler can't infer that num_rows() is constant across loop iterations (since
'this' and 'this->child_row_batch_' could alias memory that's written in the loop so it
will end up chasing the pointers on every iteration.
PS1, Line 41:       COUNTER_SET(rows_returned_counter_, num_rows_returned_);
COUNTER_SET can be pretty expensive because of the atomic operation. We can actually move
it out of this function entirely into the caller.
File be/src/exec/
PS1, Line 55: NULL
Use nullptr in new code (here and below).
PS1, Line 65:   if (codegen_status.ok()) {
Instead of this branch the control flow may be easier to follow if you factor out the CopyRows
codegen logic into a separate Status-returning function that returns early on an error.

This will be more true if you start checking FinalizeFunction() below.
PS1, Line 71:     DCHECK(copy_rows_fn != NULL);
We should probably check if this is NULL and fail codegen if so. I'm not sure if we do this
everywhere in the code but we probably should be doing so (although it isn't super-important,
since we shouldn't be failing verification in the first place).

(The invariants around this function are a bit wonky at the moment since we don't really have
good testing around this, but FinalizeFunction() is documented as returning NULL in some cases).
PS1, Line 99:   bool copy_rows_success = false;
Just declare it inside the loop where it's needed? When I see it declared up here I tend to
think that the value lives across loop iterations.
PS1, Line 114:     copy_rows_success = false;
It's assigned on both branches so this isn't necessary.

I think some people consider this good defensive coding practice but it seems unnecessary
to me.

To view, visit
To unsubscribe, visit

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e
Gerrit-Change-Number: 8196
Gerrit-PatchSet: 1
Gerrit-Owner: Bikramjeet Vig <>
Gerrit-Reviewer: Michael Ho <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-Comment-Date: Mon, 02 Oct 2017 23:40:05 +0000
Gerrit-HasComments: Yes

  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message