Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 0CD2B200CC4 for ; Thu, 29 Jun 2017 00:41:30 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 0BD54160BFA; Wed, 28 Jun 2017 22:41:30 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 52EB1160BF7 for ; Thu, 29 Jun 2017 00:41:29 +0200 (CEST) Received: (qmail 61520 invoked by uid 500); 28 Jun 2017 22:41:27 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 61483 invoked by uid 99); 28 Jun 2017 22:41:25 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Jun 2017 22:41:25 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 638DA180412 for ; Wed, 28 Jun 2017 22:41:25 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.362 X-Spam-Level: X-Spam-Status: No, score=0.362 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id JFLIusNYqkUO for ; Wed, 28 Jun 2017 22:41:24 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 3CEEE5FD91 for ; Wed, 28 Jun 2017 22:41:23 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id v5SMfMEE019448; Wed, 28 Jun 2017 22:41:22 GMT Message-Id: <201706282241.v5SMfMEE019448@ip-10-146-233-104.ec2.internal> Date: Wed, 28 Jun 2017 22:41:22 +0000 From: "Dimitris Tsirogiannis (Code Review)" To: Alex Behm , impala-cr@cloudera.com, reviews@impala.incubator.apache.org CC: Mostafa Mokhtar , Zach Amsden Reply-To: dtsirogiannis@cloudera.com X-Gerrit-MessageType: comment Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-5547=3A_Rework_FK/PK_join_detection=2E=0A?= X-Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067 X-Gerrit-ChangeURL: X-Gerrit-Commit: fe00fb4dce1016719b1523863a072f3e9f443b3c In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.7 archived-at: Wed, 28 Jun 2017 22:41:30 -0000 Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5547: Rework FK/PK join detection. ...................................................................... Patch Set 1: (17 comments) I have one comment about testing. We have lots of test changes but it's not easy to verify if the end result makes sense or not. I think it may worth having a planner test with explain_level > 2 that covers a few interesting join cases so that we can see what kind of pk/fk conjuncts are detected and what the impact is on estimated join cardinalities. http://gerrit.cloudera.org:8080/#/c/7257/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java: PS1, Line 170: else if (fkPkEqJoinConjuncts_.isEmpty()) { : output.append("assumed fk/pk"); I haven't read the entire patch yet, so I am curious what this case represents. PS1, Line 178: for (int j = 0; j < slots.size(); ++j) { : output.append(slots.get(j).toString()); : if (j + 1 != slots.size()) output.append(", "); : } Can't you use the Guava's Joiner class here? Joiner.on(",").join(slots) http://gerrit.cloudera.org:8080/#/c/7257/1/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: PS1, Line 37: import org.apache.kudu.client.shaded.com.google.common.collect.Maps; ? PS1, Line 271: scanSlotsByTids nit: Each pair represents two joined tables, no? So, maybe rename this to 'scanSlotsByJoinedTids'? A bit more explicit on what this thing represents. PS1, Line 281: numRows rhsTableCardinality? Line 290: nit: extra line PS1, Line 294: conjuncts conjunct slots PS1, Line 300: Preconditions.checkState(joinOp_.isInnerJoin() || joinOp_.isOuterJoin()); This private function is only called in L253? I think you can remove this check. PS1, Line 312: lhsNdv Can this ever be 0? PS1, Line 313: rhsNumRows Same here, can this be 0? PS1, Line 318: result = Math.min(result, joinCard); That part I think is very important and I am not sure it is explained anywhere. Maybe expand the comment in L209? PS1, Line 329: conjuncts conjunct slots PS1, Line 341: slots.lhs().getStats().getNumDistinctValues(); nit: these are used in couple of places. Since you already have the EqJoinConjunctScanSlots class, why don't your wrap them in some nice util functions there (e.g. getLhsNdvs() or something like that). PS1, Line 343: slots.lhs().getParent().getTable().getNumRows(); same comment as above. PS1, Line 373: Preconditions I don't think these checks are useful. This private constructor is only called through the static function that has tight control on what gets passed in this ctor. PS1, Line 382: . nit: remove PS1, Line 399: tupleDesc nit: inline -- To view, visit http://gerrit.cloudera.org:8080/7257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes