hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phabricator (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-5657) TopN produces incorrect results with count(distinct)
Date Wed, 30 Oct 2013 08:07:26 GMT

     [ https://issues.apache.org/jira/browse/HIVE-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Phabricator updated HIVE-5657:
------------------------------

    Attachment: D13797.1.patch

navis requested code review of "HIVE-5657 [jira] TopN produces incorrect results with count(distinct)".

Reviewers: JIRA

HIVE-5657 TopN produces incorrect results with count(distinct)

Attached patch illustrates the problem.
limit_pushdown test has various other cases of aggregations and distincts, incl. count-distinct,
that work correctly (that said, src dataset is bad for testing these things because every
count, for example, produces one record only), so something must be special about this.
I am not very familiar with distinct- code and these nuances; if someone knows a quick fix
feel free to take this, otherwise I will probably start looking next week.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D13797

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorReduceSinkOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
  ql/src/test/queries/clientpositive/limit_pushdown.q
  ql/src/test/queries/clientpositive/limit_pushdown_negative.q
  ql/src/test/results/clientpositive/limit_pushdown.q.out
  ql/src/test/results/clientpositive/limit_pushdown_negative.q.out
  serde/src/java/org/apache/hadoop/hive/serde2/KeySerializer.java
  serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.java
  serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/OutputByteBuffer.java

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/41811/

To: JIRA, navis


> TopN produces incorrect results with count(distinct)
> ----------------------------------------------------
>
>                 Key: HIVE-5657
>                 URL: https://issues.apache.org/jira/browse/HIVE-5657
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Navis
>            Priority: Critical
>         Attachments: D13797.1.patch, example.patch, HIVE-5657.1.patch.txt
>
>
> Attached patch illustrates the problem.
> limit_pushdown test has various other cases of aggregations and distincts, incl. count-distinct,
that work correctly (that said, src dataset is bad for testing these things because every
count, for example, produces one record only), so something must be special about this.
> I am not very familiar with distinct- code and these nuances; if someone knows a quick
fix feel free to take this, otherwise I will probably start looking next week. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message