impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Hecht (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
Date Thu, 15 Sep 2016 23:59:26 GMT
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
......................................................................


Patch Set 14:

> (2 comments)
 > 
 > Before I responded to individual comments I wanted to point out
 > that the coupling is now one way - the builder never refers to
 > PhjNode aside from metadata provided via the constructor (you can
 > verify there are no references to partitioned-hash-join-node.h or
 > PartitionedHashJoin node in the builder code).

Yup, that's good.  I thought there was a back reference on my first pass through, but I was
thinking of the reference from partition back to the builder/sink.  Since it is the sink,
it seems a little weird (once the builder is used by the join node in the other fragment),
but okay for this first step.  Splitting out the sink and builder is something we should revisit
later.

 > 
 > At this point it should only require minor changes to this hash
 > join code to create the builder in a separate fragment *before* the
 > join node, provided you have a 1:1 mapping from builder to node.
 > 1:n broadcast joins with no spilling would require some more
 > changes, but is a lot closer.
 > 

Yup, agree.

 > I chose this stopping point because there was a clear physical
 > separation between the classes, even if the spilling algorithm has
 > some logical coupling and assumes a 1:1 mapping from node to
 > builder.
 > 
 > Reducing the logical coupling would require significant changes to
 > the spilling algorithm that is driven from PartitionedHashJoin::GetNext(),
 > which I think should be deferred until there is some consensus
 > about what spilling for 1:n broadcast joins with multithreading
 > might look like.

I'm okay with this point more or less as an incremental first step, as well. I just wanted
to understand better where we are going longer term and what the immediate requirements are.

As mentioned, looking at whether the sink should be just a sink, and then the "builder" is
something that's driven by both the sink and also the join node.  But I'm fine with waiting
on that since this is a good step in teasing things apart.

-- 
To view, visit http://gerrit.cloudera.org:8080/3873
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: No

Mime
View raw message