impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Hecht (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
Date Tue, 13 Sep 2016 23:26:00 GMT
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
......................................................................


Patch Set 14:

(25 comments)

Here's a first set of comments, mostly focused on phjn.h.  I need to go back through phjn.cc,
builder.h/cc.

http://gerrit.cloudera.org:8080/#/c/3873/14/be/src/exec/analytic-eval-node.cc
File be/src/exec/analytic-eval-node.cc:

Line 207:         state, Substitute(PREPARE_FAILED_ERROR_MSG, "write"));
old code and agg uses state_->block_mgr()->MemLimitTooLowError().  Any reason this uses
MemLimitExceeded() directly instead? Just wondering the reasoning.


http://gerrit.cloudera.org:8080/#/c/3873/14/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

Line 79:   /// Logically, the algorithm has the following modes:.
I think this comment could be a lot more useful if we explain how the phases work together
to give the full algorithm.  Probably that could replace the state transition graph which
I find confusing.


PS14, Line 81: partition them into hash_partitions_. The input can 
this comment is kind of disjoint since it talks about what the phase does, then where the
input comes from, and then explains more about what the phase does.  Let's format each phase
consistently, maybe describe inputs first, then actions, then outputs?


PS14, Line 86: PROCESSING
why do we use PROCESSING term here rather than PARTITIONING, which would be more consistent?


PS14, Line 87: spilled
single spilled
(or remove single from #3 since having it in one place but not the other makes the reader
wonder if there's a difference).


Line 89:   ///      table in memory, we perform the join, otherwise we spill the probe row.
this last sentence is confusing since it mostly restates what the first says adds a small
amount of new info. Let's combine them.


PS14, Line 90: PROBING_SPILLED_PARTITION
it would be good to be more upfront about when this phase is used and how, for a spilled partition,
we either do this phase or REPARTITIONING_PROBE phase.


PS14, Line 90: construct
what does this phase construct? isn't the hash table already constructed?


PS14, Line 91: walking
what does this mean? processing? 

also, from these comments it's not clear how these stages relate. You could read this to mean
that #2 a fallback if we can't do #3, or that #3 is the probing that "perform the join" of
#2.


PS14, Line 97: *
what does this star mean to say? We can go from REPARTITIONING_PROBE back to PROBING_SPILLED_PARTITION,
right?


PS14, Line 143: buffer
write buffer?


PS14, Line 264: the
its


PS14, Line 267: the
that


PS14, Line 291: Walks
What does this mean. Iterates over?


PS14, Line 291: hash partitions
is this talking about the probe hash partitions or the builder's?


PS14, Line 384: builder_->hash_partitions
is this referring to the entries in the builder_->hash_partition_ array?


PS14, Line 385: This is not used when processing a single partition.
what does this mean?


PS14, Line 410:  we then iterate over all the probe rows
I can't tell if this means literally all the probe rows, or all the rows in this stream. 
clarify this paragraph.


PS14, Line 432: and
this makes it sound like there are two conditions required to create a probe partition, but
I think you mean this is implied by the first part.  "because"?


PS14, Line 436: preallocated
what does this mean?


PS14, Line 437: and
double and.  And is this saying the constructor prepares it or the caller should have already?
 is this right:

... should be an unpinned stream that has been prepared for writing with ...


PS14, Line 443: should
will


PS14, Line 464: meaning
              :     /// it has to call Close() on it) but transferred to the parent exec node
this doesn't seem right. don't we either Close() it or transfer it (not both), in PartitionedHashJoinNode::ProbePartition::Close()?


http://gerrit.cloudera.org:8080/#/c/3873/14/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

PS14, Line 409: DCHECK
DCHECK_EQ


Line 412:   children_.insert(children_.begin(), child_entry);
rather than having two special cases (here and line 395-399, how about making AddChildLocked()
take the position iterator? This case passes begin() and then AddChild() case passes either
end() or the middle iterator.


-- 
To view, visit http://gerrit.cloudera.org:8080/3873
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message