impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
Date Thu, 15 Sep 2016 22:11:45 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
......................................................................


Patch Set 14:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/3873/14/be/src/exec/partitioned-hash-join-builder.h
File be/src/exec/partitioned-hash-join-builder.h:

PS14, Line 45: build row partitions
> is this the same as the "hash partitions" referred to in the next paragraph
really the streams, reworded.


Line 86: 
> is it expected that the renaming public functions would only be used by PHJ
Done


PS14, Line 90: Acquire ownership
> Doesn't the ownership pass from this class to the exec node?  In which case
It passes from the class to the caller. I guess I don't find this ambiguous given the return
type and can't think of a non-awkward alternative.


PS14, Line 95: Called after probing of the partitions
             :   /// is done. The partitions are not closed or destroyed, since they may be
spilled or
             :   /// may contain unmatched build rows for certain join modes (e.g. right outer
join).
> The fact that we need some much explanation kinda makes this seem like it m
The hash join node uses this to determine whether it's already cleaned up the hash partitions,
and to keep the build/probe hash partition vectors in sync.

Agree this isn't the interface we ultimately want but I wanted to avoid additional restructuring
of GetNext()/CleanupHashPartitions().


Line 106:   }
> nit: missing blank line
Done


PS14, Line 109: When this call returns
> Is this stuff by this method or the caller?  i.e. if this method closes inp
Done


PS14, Line 112: so that the probe phase has enough buffers preallocated
              :   /// execute successfully.
> slightly garbled. seems like a word is missing or something.
Done


PS14, Line 117: /// Iterates over all the partitions in 'hash_partitions_' and returns the
largest
              :   /// number of build rows.
> rather than say how and referring to private member, and leaving it to the 
Done


Line 123:   bool HashTableStoresNulls() const;
> Both classes already have their own hash table context.
Also, I think this makes sense to have here since the builder manages the hash tables.


PS14, Line 182: partition side of this partition
> not sure what this is saying. maybe just "for this build partition"?
Done


Line 188:     /// false.
> maybe clarify that an error is not returned in that case.  or clarify what 
Done


PS14, Line 195: unpin_all_build
> why "build"?  shouldn't it be unpin_all_blocks?  or even better would be to
Overall these bool arguments are really confusing, and the correctness of the spilling depends
a lot on using the right value in the right place. I ended up changing this significantly
to use an enum all the way down to the underlying BufferedTupleStream method, and adding this
to SpillPartition(). It turns out there was a bug in BuildHashTablesAndPrepareProbeStreams()
where it should have been unpinning all of the blocks in the partition it chose to spill but
wasn't. 

I'm rerunning the memory experiment just to confirm this doesn't change anything.


PS14, Line 234: build or probe
> does this care about the probe side partitioning?
Yes, since we share the reservation and hand off the probe buffers.


PS14, Line 244: all hash partitions for partitioning level
> this is kinda misleading. maybe qualify with ... for the current input, or 
Done


PS14, Line 252: in
> into
Done


PS14, Line 256: in memory
> what does this mean? is it trying to say it fits in the current (pinned) bl
Reworded it - I think it makes more sense to think of it as whether appending to the stream
succeeds (which may advance the block) or whether we have to start spilling things.


-- 
To view, visit http://gerrit.cloudera.org:8080/3873
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message