impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Impala Public Jenkins (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4792: Fix number of distinct values for a CASE with constant outputs
Date Fri, 03 Feb 2017 00:30:35 GMT
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4792: Fix number of distinct values for a CASE with constant outputs
......................................................................


IMPALA-4792: Fix number of distinct values for a CASE with constant outputs

If all the return values of a Case expression have a known number of
distinct values (i.e. they are constant or statistics exist), then
the number of distinct values for the Case can be computed using this
information.

In order for the value from Case to be used at higher levels in the tree,
the implementation of computeNumDistinctValues for Expr needed to change.
Previously, Expr calculated the number of distinct values by finding any
SlotRefs in its tree and taking the maximum of the distinct values from
those SlotRefs. This would ignore the value from CaseExpr. To fix this,
Expr now takes the maximum number of distinct values across all of its
children.

-- explaining this statement shows cardinality = 2
explain select distinct case when id = 1 then 'yes' else 'no' end
from functional.alltypes;

-- explaining this statement shows cardinality = 2
explain select distinct char_length(case when id = 1 then 'yes' else 'no' end)
from functional.alltypes;

-- explaining this statement shows cardinality = 7300
explain select distinct case when id = 1 then 0 else id end
from functional.alltypes;

-- explaining this statement shows cardinality = 737 (date_string_col has lower
-- cardinality than id)
explain select distinct case when id = 1 then 'yes' else date_string_col end
from functional.alltypes;

For cases when the number of distinct values is not known for all the outputs,
this will return -1, indicating that the number of distinct values is not
known. The inputs (whens) are not used for calculating the number of distinct
values.

Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2
Reviewed-on: http://gerrit.cloudera.org:8080/5768
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
---
M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
A fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
3 files changed, 185 insertions(+), 8 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5768
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <joemcdonnell@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell <joemcdonnell@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>

Mime
View raw message