impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bharath Vissapragada (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-5612: join inversion should factor in parallelism
Date Fri, 07 Jul 2017 23:32:48 GMT
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5612: join inversion should factor in parallelism

Patch Set 2:


I have some minor comments, the patch looks ok to me otherwise.
File fe/src/main/java/org/apache/impala/planner/

Line 386:    *    cardinality*avgSerializedSize. Do not invert if relevant stats are missing.
Update comment to add the 4th case?

PS2, Line 424: invertedJoinIsCheaper

PS2, Line 459: (log_b(rhsBytes) + C) * (lhsCard + 2 * rhsCard)
What is the unit of this? In other words what exactly are we trying to optimize per node?
Based on the last point above, I thought it would look something like,

((log_b(rhsBytes) * lhsCard) + 2 * rhsCard) * C

My understanding was more like, for each probe row, we look up the hash table (= ~(log_b(rhsBytes)
* lhsCard)) and 2 * rhsCard for building the hash table and C is the fixed cost. Am I missing

Line 488:     final long CONSTANT_COST_PER_ROW = 5;
How was this chosen?

PS2, Line 491: log10
Shouldn't this be base 2? Don't think it matters as long as we use same for both the cases,
but just wondering.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: Icacea4565ce25ef15aaab014684c9440dd501d4e
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Bharath Vissapragada <>
Gerrit-Reviewer: Mostafa Mokhtar <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message