Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 14E7D200C1E for ; Fri, 3 Feb 2017 01:30:40 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 13A12160B61; Fri, 3 Feb 2017 00:30:40 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 5B4D9160B57 for ; Fri, 3 Feb 2017 01:30:39 +0100 (CET) Received: (qmail 44438 invoked by uid 500); 3 Feb 2017 00:30:38 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 44427 invoked by uid 99); 3 Feb 2017 00:30:38 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Feb 2017 00:30:38 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id C7A31C03A4 for ; Fri, 3 Feb 2017 00:30:37 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.362 X-Spam-Level: X-Spam-Status: No, score=0.362 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id p7hTT0stQPeH for ; Fri, 3 Feb 2017 00:30:36 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 6DB4E5F30B for ; Fri, 3 Feb 2017 00:30:36 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id v130UZpZ005339; Fri, 3 Feb 2017 00:30:36 GMT Message-Id: <201702030030.v130UZpZ005339@ip-10-146-233-104.ec2.internal> Date: Fri, 3 Feb 2017 00:30:35 +0000 From: "Impala Public Jenkins (Code Review)" To: Joe McDonnell , impala-cr@cloudera.com, reviews@impala.incubator.apache.org X-Gerrit-MessageType: merged Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-4792=3A_Fix_number_of_distinct_values_for_a_CASE_with_constant_outputs=0A?= X-Gerrit-Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2 X-Gerrit-ChangeURL: X-Gerrit-Commit: 59cdf6b8f2a6180b727bcb9ee336a65381377ace In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.2 archived-at: Fri, 03 Feb 2017 00:30:40 -0000 Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4792: Fix number of distinct values for a CASE with constant outputs ...................................................................... IMPALA-4792: Fix number of distinct values for a CASE with constant outputs If all the return values of a Case expression have a known number of distinct values (i.e. they are constant or statistics exist), then the number of distinct values for the Case can be computed using this information. In order for the value from Case to be used at higher levels in the tree, the implementation of computeNumDistinctValues for Expr needed to change. Previously, Expr calculated the number of distinct values by finding any SlotRefs in its tree and taking the maximum of the distinct values from those SlotRefs. This would ignore the value from CaseExpr. To fix this, Expr now takes the maximum number of distinct values across all of its children. -- explaining this statement shows cardinality = 2 explain select distinct case when id = 1 then 'yes' else 'no' end from functional.alltypes; -- explaining this statement shows cardinality = 2 explain select distinct char_length(case when id = 1 then 'yes' else 'no' end) from functional.alltypes; -- explaining this statement shows cardinality = 7300 explain select distinct case when id = 1 then 0 else id end from functional.alltypes; -- explaining this statement shows cardinality = 737 (date_string_col has lower -- cardinality than id) explain select distinct case when id = 1 then 'yes' else date_string_col end from functional.alltypes; For cases when the number of distinct values is not known for all the outputs, this will return -1, indicating that the number of distinct values is not known. The inputs (whens) are not used for calculating the number of distinct values. Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2 Reviewed-on: http://gerrit.cloudera.org:8080/5768 Reviewed-by: Alex Behm Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java M fe/src/main/java/org/apache/impala/analysis/Expr.java A fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java 3 files changed, 185 insertions(+), 8 deletions(-) Approvals: Impala Public Jenkins: Verified Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/5768 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker