impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
Date Thu, 25 Aug 2016 00:50:23 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
......................................................................


Patch Set 9:

(15 comments)

I'm not quite done with reviewing everything, but getting close.

http://gerrit.cloudera.org:8080/#/c/3873/9//COMMIT_MSG
Commit Message:

Line 22: spill additional partitions during probing if the probe partitions needed
your new solution is much saner, thank you!


http://gerrit.cloudera.org:8080/#/c/3873/9/be/src/exec/analytic-eval-node.cc
File be/src/exec/analytic-eval-node.cc:

Line 206:     Status status = Status::MemLimitExceeded();
let's make the error reporting for PrepareForWrite() and PrepareForRead() more similar, e.g.,
by adding a PREPARE_FOR_WRITE_FAILED_ERROR_MSG and doing the same as for the read case


http://gerrit.cloudera.org:8080/#/c/3873/9/be/src/exec/data-sink.cc
File be/src/exec/data-sink.cc:

Line 165:   if (expr_mem_tracker_ != NULL) {
the function comment makes it sound like we should also have:

if (closed_) return;


http://gerrit.cloudera.org:8080/#/c/3873/9/be/src/exec/nested-loop-join-builder.cc
File be/src/exec/nested-loop-join-builder.cc:

Line 35
revert?


http://gerrit.cloudera.org:8080/#/c/3873/9/be/src/exec/partitioned-hash-join-builder.cc
File be/src/exec/partitioned-hash-join-builder.cc:

Line 310:   for (unique_ptr<Partition>& partition: all_partitions_)  partition->Close(NULL);
nit: extra space before 'partition'


Line 464:   // SwitchToIoBuffers() call below will succeed.
Does this comment make sense? Can we use small buffers and have a stream in the same partition?


http://gerrit.cloudera.org:8080/#/c/3873/9/be/src/exec/partitioned-hash-join-builder.h
File be/src/exec/partitioned-hash-join-builder.h:

Line 46: class PhjBuilder : public DataSink {
PartitionedHashJoinBuilder for consistency?


Line 69:   /// This function walks the current hash partitions and picks one to spill.
... and spills the largest one


Line 174:     /// transferring ownership to them.
this line doesn't seem to add anything


http://gerrit.cloudera.org:8080/#/c/3873/9/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 865: Status PartitionedHashJoinNode::BuildHashTablesAndPrepareProbePartitions() {
It would be good to document what exactly is handed off from the PHJ builder to the PHJ node
(e.g., in the PHJ builder class comment)

To me it would make sense to move the actual HT construction into the builder as well to accommodate
the in-memory broadcast join scenario (the probe threads expect the HT to be built).

I think we should strive to not spill build partitions in the hash join node to maintain a
clean separation, but I realize this would be a drastic departure from the existing code since
we re-partition spilled partitions lazily during the probe. 

If we had reservations, it seems like we could determine the re-partitioning requirements
before the probe.


Line 959:       RETURN_IF_ERROR(builder_->SpillPartition(&spilled_partition_idx));
It feels like there must be a simpler and more efficient way to handle this case - we are
going to spill the largest partition here, dropping the constructed HT.

At the same time it's really an edge case, so probably better to defer. It's certainly better
than before :)


http://gerrit.cloudera.org:8080/#/c/3873/9/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

Line 79:   /// Logically, the algorithm has the following modes:.
would be good to document somewhere (not necessarily here) what phases are executed where,
i.e., in the builder or the join node. seems like we'll end up with the build phase finishing
the level0 partitioning and HT construction, and then the join may need to repartition probe
and build partitions


Line 445:     /// Close the partition and attach resources to 'batch' if non-NULL or free
the
can you make the comment on BuildPartition::Close() similar to this one? this new one is much
better


http://gerrit.cloudera.org:8080/#/c/3873/9/be/src/runtime/buffered-tuple-stream.h
File be/src/runtime/buffered-tuple-stream.h:

Line 231:   /// Prepares the stream for reading. Called after Init() and before the first
AddRow()
maybe expand to:

Prepares the stream for writing by attempting to pin a block to buffer the writes.


http://gerrit.cloudera.org:8080/#/c/3873/9/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

Line 400:   AddChild(child, indent);
shouldn't the addition and the swap be atomic under children_lock_?

the interfaces existing interfaces seem to make atomicity difficult (AddChild() takes and
releases lock)


-- 
To view, visit http://gerrit.cloudera.org:8080/3873
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message