impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-5547: Rework FK/PK join detection.
Date Thu, 29 Jun 2017 07:27:42 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-5547: Rework FK/PK join detection.

Patch Set 1:


I still need to add the targeted planner tests and adjust new tests after the rebase. In the
meantime, you can continue looking at the code changes.
File fe/src/main/java/org/apache/impala/planner/

PS1, Line 170: else if (fkPkEqJoinConjuncts_.isEmpty()) {
             :           output.append("assumed fk/pk");
> I haven't read the entire patch yet, so I am curious what this case represe
In this case we have no equi-join conjuncts that we can reason about, so we are optimistic
and assume fk/pk with a join selectivity of 1.

We cannot reason about an equi-join conjunct if the underlying table/columns have no stats
or if the conjunct involves non-trivial expressions (anything that is not a SlotRef).

PS1, Line 178: for (int j = 0; j < slots.size(); ++j) {
             :               output.append(slots.get(j).toString());
             :               if (j + 1 != slots.size()) output.append(", ");
             :             }
> Can't you use the Guava's Joiner class here? Joiner.on(",").join(slots)
Better. Done.
File fe/src/main/java/org/apache/impala/planner/

PS1, Line 37: import;
> ?
Sigh. Fixed.

Line 91:   protected List<List<EqJoinConjunctScanSlots>> fkPkEqJoinConjuncts_;
> Class member is only used in one function in the base class.  Maybe documen
Added comment. Not sure it's worth protecting against accidental modifications in inheriting
classes. I think it's fair to assume that subclasses should treat protected members carefully.
Subclasses can already muck with pretty much anything here, and adding getters that wrap members
with immutables seems cumbersome, and generates objects "unnecessarily".

Line 253:       return getFkPkJoinCardinality(eqJoinConjunctSlots, lhsCard, rhsCard);
> This computes estimates based on both PK conjuncts and non-PK conjuncts usi
1. Changed the code to only pass those FK/PK conjuncts because that makes the behavior and
code clearer. I'm not sure I follow your described scenario. If the RHS is very selective,
then it's correct to apply that to the join cardinality - if we believe that FK/PK is indeed
the case. Are you concerned that our FK/PK assumption could be wrong and we might underestimate
the join cardinality

2. Good idea. A flattened, ordered list makes things simpler in various places. Done.

PS1, Line 271: scanSlotsByTids
> nit: Each pair represents two joined tables, no? So, maybe rename this to '

Line 275:     for (List<EqJoinConjunctScanSlots> fkPkCandi: scanSlotsByTids.values())
> fkPkCandi - this could use a better name, it isn't clear to me what this is
I renamed it to fkPkCandidate, but I doubt that's what you had in mind. Can you help me come
up with a better name? This thing is a list of equi-join conjuncts between the same two tables.
These conjuncts could represent a single- or multi-column FK/PK join condition.

PS1, Line 281: numRows
> rhsTableCardinality?
Used rhsNumRows because we typically use 'numRows' to indicate values that come directly from
stats, although technically table cardinality is also correct of course.

Line 290: 
> nit: extra line

PS1, Line 294: conjuncts
> conjunct slots

PS1, Line 300:     Preconditions.checkState(joinOp_.isInnerJoin() || joinOp_.isOuterJoin());
> This private function is only called in L253? I think you can remove this c

PS1, Line 312: lhsNdv
> Can this ever be 0?
Probably. Better be defensive. Fixed here and elsewhere.

PS1, Line 313: rhsNumRows
> Same here, can this be 0?
Probably. Better be defensive. Fixed here and elsewhere.

PS1, Line 318: result = Math.min(result, joinCard);
> That part I think is very important and I am not sure it is explained anywh
This part applies to both methods (generic and FK/PK). I expanded the comment on getJoinCardinality()
to explain why we take the min().

We should certainly try (in a subsequent patch) whether estimating the joint selectivity of
multiple predicates with exponential backoff or similar works well.

PS1, Line 329: conjuncts
> conjunct slots

PS1, Line 341: slots.lhs().getStats().getNumDistinctValues();
> nit: these are used in couple of places. Since you already have the EqJoinC

PS1, Line 343: slots.lhs().getParent().getTable().getNumRows();
> same comment as above.

PS1, Line 373: Preconditions
> I don't think these checks are useful. This private constructor is only cal

PS1, Line 382: .
> nit: remove

PS1, Line 399: tupleDesc
> nit: inline
Removed tupleDesc instead, otherwise the if condition below becomes multi-line and harder
to read imo.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Dimitris Tsirogiannis <>
Gerrit-Reviewer: Mostafa Mokhtar <>
Gerrit-Reviewer: Zach Amsden <>
Gerrit-HasComments: Yes

View raw message