impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Tauber-Marshall (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-3524: Don't process spilled partitions with 0 probe rows
Date Tue, 10 Jan 2017 23:34:20 GMT
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-3524: Don't process spilled partitions with 0 probe rows

Patch Set 7:

File be/src/exec/

Line 388:           runtime_state_, Substitute(PREPARE_FOR_READ_FAILED_ERROR_MSG, id_));
> can't this cause us to fail a query that we used to be able to execute, sin
As discussed, this is okay because we would've needed at least as much memory to go through
the repartitioning path anyways.

Also, I looked into moving PrepareForRead into OutputUnmatchedBuild to minimize the time we
have the memory used, but I don't think this works, at least not without complicating the
code, as OutputUnmatchedBuild may be called multiple times for the same partition if we reach
out_batch->AtCapacity and calling PrepareForRead each time would be incorrect. And anyways,
OutputUnmatchedBuild will be called almost immediately after this, so the difference in memory
usage is minor.

PS6, Line 652: There were no probe rows so we skipped building the hash table. In this case,
> this comment doesn't explains more "what" than "why", and what is fairly cl

Line 677:     if (output_unmatched_batch_iter_->AtEnd()) {
> Something I missed in the first pass. 'output_unmatched_batch_' may have ha
I  played around with this for awhile without success.

In particular, it seemed like for the queries that i ran, no resources were attached to output_unmatched_batch_.
Is there a specific situation in which you expect this to happen?

Line 702:     output_build_partitions_.pop_front();
> why is this the case?
See the comment at the top of this function.
File be/src/exec/partitioned-hash-join-node.h:

PS6, Line 291:  will
             :   /// be the on
> seems redundant with "first"

PS6, Line 293: titions_'.
             :   Status OutputAllBuild(RowBatch* out_batch);
> this seems to contradict the "first partition of 'output_build_partitions_'

PS6, Line 306: 
> doesn't it increment the out_batch_iterator?
No, out_batch_iterator.Next() is called after OutputBuildRow, depending on the return of EvalConjuncts.

I could certainly move that logic into OutputBuildRow if you think its clearer, it would just
give OutputBuildRow a somewhat long argument list.

Line 362:   /// rows. If there are no probe rows, we just prepare the build side to be read
> this comment is now stale.

Line 409:   RuntimeProfile::Counter* null_aware_eval_timer_;
> add a comment to explain it. e.g. what does it mean to be "skipped"?

PS6, Line 477: putAllBuild() to iterat
> this also seems to contradict the earlier claim that there is only one part

Line 479:   std::unique_ptr<RowBatch> output_unmatched_batch_;
> this comment is tough to understand if you haven't already read the code in
File testdata/workloads/tpch/queries/tpch-outer-joins.test:

Line 32
To be clear, I verified that this test is still testing the situation from the referenced
JIRA without these lines.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I175b32dd9031e51218b38c37693ac3e31dfab47b
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Thomas Tauber-Marshall <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message