Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7777F200C40 for ; Thu, 9 Mar 2017 02:03:31 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 76031160B86; Thu, 9 Mar 2017 01:03:31 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 99CCA160B83 for ; Thu, 9 Mar 2017 02:03:30 +0100 (CET) Received: (qmail 8991 invoked by uid 500); 9 Mar 2017 01:03:29 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 8980 invoked by uid 99); 9 Mar 2017 01:03:29 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Mar 2017 01:03:29 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 33DDA186280 for ; Thu, 9 Mar 2017 01:03:29 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.363 X-Spam-Level: X-Spam-Status: No, score=0.363 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id zk_kHKfuXZxI for ; Thu, 9 Mar 2017 01:03:27 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 6A5345FC00 for ; Thu, 9 Mar 2017 01:03:27 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id v2913QG1002234; Thu, 9 Mar 2017 01:03:26 GMT Message-Id: <201703090103.v2913QG1002234@ip-10-146-233-104.ec2.internal> Date: Thu, 9 Mar 2017 01:03:26 +0000 From: "Alex Behm (Code Review)" To: Thomas Tauber-Marshall , impala-cr@cloudera.com, reviews@impala.incubator.apache.org Reply-To: alex.behm@cloudera.com X-Gerrit-MessageType: comment Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-4731/IMPALA-397/IMPALA-4728=3A_Materialize_sort_exprs=0A?= X-Gerrit-Change-Id: Ifefdaff8557a30ac44ea82ed428e6d1ffbca2e9e X-Gerrit-ChangeURL: X-Gerrit-Commit: 29eab800919b41c5130acd0e9770e8e91f01d9fb In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.7 archived-at: Thu, 09 Mar 2017 01:03:31 -0000 Alex Behm has posted comments on this change. Change subject: IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs ...................................................................... Patch Set 1: (23 comments) http://gerrit.cloudera.org:8080/#/c/6322/1/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: Line 212: // True if this Expr is known to have a deterministic value, false if it is True if this exprs always returns the same value given the same input. False if ... Line 213: // non-determinstic or unknown (eg. UDFs). Set at the end of analyze() and valid if e.g. http://gerrit.cloudera.org:8080/#/c/6322/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: Line 292: // We currently don't have a way to indicate if a UDF is deterministic, so just always Ideally we should not conflate the UDF and deterministic concepts. For example, one might reasonably think that FunctionCallExpr.isNondeterministicBuiltinFn() should return the opposite of as Expr.isDeterministic() - but they don't and that could lead to confusion. I suggest having two separate members. We can revisit the treatment of UDFs with respect to determinism sometime later. http://gerrit.cloudera.org:8080/#/c/6322/1/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java: Line 253: if (!(smap.getLhs().get(i) instanceof SlotRef) use {} for multi-line if http://gerrit.cloudera.org:8080/#/c/6322/1/fe/src/main/java/org/apache/impala/analysis/SortInfo.java File fe/src/main/java/org/apache/impala/analysis/SortInfo.java: Line 39: // TODO: run performance tests to determine this value. Remove TODO, instead comment where this value came from, maybe with some examples of exprs that cost 5, also mention what the effect of this tuning parameter is (i.e. all ordering exprs with cost > this will be materialized) Line 46: private List materializedOrderingExprs_; needs a brief comment, in particular why we need to store them Line 172: TreeNode.collect(orderingExprs_, Predicates.instanceOf(SlotRef.class), sourceSlots); We should collect the SlotRefs only for non-materialized ordering exprs. Line 180: // substOrderBy is the mapping from slot refs in the sort node's input to slot refs in update comment to reflect the new contents of the substOrderBy smap Line 196: // The ordering exprs still point to the old slot refs and need to be replaced with update comment, we are not only substituting slot refs Line 208: * SlotRefs to 'sortTupleExprs'. Say something about the args of this function. Also mention side-effects (populates materializedOrderingExprs_) Line 210: * Materialize ordering exprs that are non-deterministic, are more expensive than a cost make this the first sentence in this comment Line 211: * threshold, or don't have a cost set. Also obey the 'materialize_sort' query option, no more hint, right? Line 214: * Returns an ExprSubstitutionMap from the new SlotRefs to the original exprs. from the materialized sort exprs to the new SlotRefs Line 216: public ExprSubstitutionMap createMaterializedOrderExprs( Instead of returning a new smap and combining with the other one, why not pass in the existing smap and directly add entries to it? The 'sortTupleExprs' param is just the lhs of the existing smap, so I think this interface can be made a little simpler / more direct. Then you could use the return value to return the exprs that you chose to materialize. Fewer side-effects are easier to reason about. Line 217: TupleDescriptor sortTupleDesc, List sortTupleExprs, Analyzer analyzer) { analyzer needed? Line 219: //List materializedOrderingExprs = Lists.newArrayList(); remove Line 228: // LiteralExprs don't affect the sort order. TODO: consider removing the sort node Why TODO? What happens when there are no sort exprs at all left? http://gerrit.cloudera.org:8080/#/c/6322/1/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: Line 374: public void testSortMaterialization() { testSortExprMaterialization http://gerrit.cloudera.org:8080/#/c/6322/1/testdata/workloads/functional-planner/queries/PlannerTest/sort-materialization.test File testdata/workloads/functional-planner/queries/PlannerTest/sort-materialization.test: Line 19: # literal order expr and cheap order expr don't get materialized I think these tests need more examples cheap and expensive exprs to make sure we mostly do the right thing. For example, there is no test with a naked SlotRef. I suggest cramming more exprs into each individual test case. We also need to test more than 2 sort exprs. Also: Needs to test UDFs and ideally also UDAs (inside the ORDER BY of an analytic function). Let's talk if this is tricky to do with the current test infra. Line 37: select last_value(id) over (order by if(int_col = 0, id, int_col), bool_col) from functional.alltypes Can you ask Greg about some specific expensive ordering exprs that have come up recently? We should make sure that we do the right thing for them. Line 106: | materialized: log2(id) It doesn't seem right to print this here because the merging exchange does not actually materialize the expr. It relies on the preceding Top-N/Sort whatever to do that. http://gerrit.cloudera.org:8080/#/c/6322/1/tests/query_test/test_sort.py File tests/query_test/test_sort.py: Line 165: # "order by random()" should return results in a different order from unordered. Confusing sentence, what is 'unordered'? I think you can remove this and the next two lines Line 168: results = self.execute_query("select id from functional.alltypestiny") remove -- To view, visit http://gerrit.cloudera.org:8080/6322 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifefdaff8557a30ac44ea82ed428e6d1ffbca2e9e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Alex Behm Gerrit-HasComments: Yes