Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id B35D5200CBD for ; Thu, 22 Jun 2017 02:50:50 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id B1C80160BF0; Thu, 22 Jun 2017 00:50:50 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id CA769160BD5 for ; Thu, 22 Jun 2017 02:50:49 +0200 (CEST) Received: (qmail 36241 invoked by uid 500); 22 Jun 2017 00:50:49 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 36228 invoked by uid 99); 22 Jun 2017 00:50:48 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Jun 2017 00:50:48 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 35FE0C00A9 for ; Thu, 22 Jun 2017 00:50:48 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.363 X-Spam-Level: X-Spam-Status: No, score=0.363 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 8CIzz8fWD1q4 for ; Thu, 22 Jun 2017 00:50:46 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 8369D5FC99 for ; Thu, 22 Jun 2017 00:50:46 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id v5M0oi4Y011062; Thu, 22 Jun 2017 00:50:44 GMT Message-Id: <201706220050.v5M0oi4Y011062@ip-10-146-233-104.ec2.internal> Date: Thu, 22 Jun 2017 00:50:44 +0000 From: "Alex Behm (Code Review)" To: Tim Armstrong , impala-cr@cloudera.com, reviews@impala.incubator.apache.org CC: Juan Yu , Michael Ho Reply-To: alex.behm@cloudera.com X-Gerrit-MessageType: comment Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-5483=3A_Automatically_disable_codegen_for_small_queries=0A?= X-Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e X-Gerrit-ChangeURL: X-Gerrit-Commit: 70fb13dee2ee03556b5fa1002a789b3ce0ae76d3 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.7 archived-at: Thu, 22 Jun 2017 00:50:50 -0000 Alex Behm has posted comments on this change. Change subject: IMPALA-5483: Automatically disable codegen for small queries ...................................................................... Patch Set 3: (18 comments) http://gerrit.cloudera.org:8080/#/c/7153/3//COMMIT_MSG Commit Message: PS3, Line 15: a plan node > I hear your concerns and I do think we should think it through carefully. I Unless we are sure this is a common user scenario, I lean towards a default value of 0. I don't think the risk of regressions is worth the unknown benefit. I agree with Tim that the behavior is easy to explain, but it still means users might have to deal with tuning this value after upgrading. http://gerrit.cloudera.org:8080/#/c/7153/3/be/src/service/query-options.cc File be/src/service/query-options.cc: Line 497: query_options->__set_disable_codegen_rows_threshold(atoi(value.c_str())); Consider using StringParser::StringToInt() like in other places here, so we can properly validate the input. http://gerrit.cloudera.org:8080/#/c/7153/3/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: Line 255: // If the number of rows per node is below the threshold codegen will be automatically If the number of rows processed per node ... (to make it clear that these are the input rows and not the output rows) http://gerrit.cloudera.org:8080/#/c/7153/3/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: Line 279: // If the number of rows per node is below the threshold and disable_codegen is unset, If the number of rows processed per node http://gerrit.cloudera.org:8080/#/c/7153/3/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: Line 151: Needs PlannerTests. You can do something like PlannerTest.testComputeStatsMtDop(). Line 152: checkForDisableCodegen(rootFragment.getPlanRoot()); Add comment why we have to do this on the fragmented plan Line 477: boolean isSmallQuery = visitor.valid() && visitor.getMaxRowsProcessed() < threshold; remove visitor.valid() and inline result into the if Line 492: private void checkForDisableCodegen(PlanNode planRoot) { planRoot -> distributedPlan Line 493: // Determine the maximum number of rows processed by a instance of a plan node. This comments reads like the comment on MaxRowsProcessedVisitor.maxRowsProcessed_. I think we need crisper explanations for the two maximum numbers. http://gerrit.cloudera.org:8080/#/c/7153/3/fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java File fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java: Line 32: // True if we should abort because if we don't have valid estimates remove second "if" Line 47: if (caller instanceof HashJoinNode || caller instanceof NestedLoopJoinNode) { if (caller instanceof JoinNode) foundJoinNode_ = true; Line 60: || (missingStats && !scan.hasLimit() && scan.getConjuncts().isEmpty())) { Why would the existence of scan conjuncts make our estimate valid despite missing stats? Line 66: Math.max(maxRowsProcessedPerNode_, numRows / numNodes); Nit: Do the division as double and take the ceil, otherwise it seems like we're losing rows. Line 77: Math.max(maxRowsProcessedPerNode_, numRows / numNodes); Nit: Do the division as double and take the ceil, otherwise it seems like we're losing rows. Line 81: public boolean valid() { single line Line 96: return foundJoinNode_; Preconditions.checkState(valid_); seems appropriate here as well http://gerrit.cloudera.org:8080/#/c/7153/3/testdata/workloads/functional-query/queries/QueryTest/codegen.test File testdata/workloads/functional-query/queries/QueryTest/codegen.test: Line 2: ---- QUERY Any reason not to make these PlannerTests? http://gerrit.cloudera.org:8080/#/c/7153/3/tests/common/test_dimensions.py File tests/common/test_dimensions.py: Line 139: disable_codegen_rows_threshold_options=[5000], Why not 0? For the same reason you use 0 below. -- To view, visit http://gerrit.cloudera.org:8080/7153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes