hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Khaja Hussain (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-17390) Select count(distinct) returns incorrect results using tez
Date Fri, 25 Aug 2017 14:52:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-17390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16141708#comment-16141708
] 

Khaja Hussain commented on HIVE-17390:
--------------------------------------

Thanks Brian for filing the bug.

> Select count(distinct) returns incorrect results using tez
> ----------------------------------------------------------
>
>                 Key: HIVE-17390
>                 URL: https://issues.apache.org/jira/browse/HIVE-17390
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>    Affects Versions: 1.2.1
>            Reporter: Brian Goerlitz
>
> With the following combination of settings, select count(distinct) will return the results
of select sum(distinct).
> hive.execution.engine=tez
> hive.optimize.reducededuplication=true
> hive.optimize.reducededuplication.min.reducer=1
> hive.optimize.distinct.rewrite=true
> hive.groupby.skewindata=false
> hive.vectorized.execution.reduce.enabled=true
> STEPS TO REPRODUCE:
> {quote}CREATE TABLE `simple_data`(ppmonth int, sale double);
> INSERT INTO simple_data VALUES (501,25000.0),(502,60000.0),(501,40000.0),(502,70000.0),(501,35000.0),(502,60000.0);
> set hive.execution.engine=tez;
> set hive.optimize.reducededuplication=true;
> set hive.optimize.reducededuplication.min.reducer=1;
> set hive.optimize.distinct.rewrite=true;
> set hive.groupby.skewindata=false;
> set hive.vectorized.execution.reduce.enabled=true;
> select count(distinct ppmonth) from simple_data;{quote}
> Returns 1003 rather than 2



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message