hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phabricator (JIRA)" <>
Subject [jira] [Commented] (HIVE-5657) TopN produces incorrect results with count(distinct)
Date Thu, 31 Oct 2013 00:13:25 GMT


Phabricator commented on HIVE-5657:

sershe has commented on the revision "HIVE-5657 [jira] TopN produces incorrect results with

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ so this
now supports any number of distincts?
  ql/src/java/org/apache/hadoop/hive/ql/exec/ right now this only returns
forward... is this by design?
  ql/src/java/org/apache/hadoop/hive/ql/exec/ should all of this
also be done for vectorized path?
  ql/src/java/org/apache/hadoop/hive/ql/exec/ I fixed it in my
patch for vectorized... why is hash needed here?
  If row is excluded we don't need hash, it's only needed when we store the value or collect
  ql/src/java/org/apache/hadoop/hive/ql/exec/ if index >= 0
this should store value
  ql/src/java/org/apache/hadoop/hive/ql/exec/ Previously there
was just key, which was some columns and optionally one distinct.
  Do I read correctly that distribution key is now the same, just without distinct?


To: JIRA, navis
Cc: sershe

> TopN produces incorrect results with count(distinct)
> ----------------------------------------------------
>                 Key: HIVE-5657
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Navis
>            Priority: Critical
>         Attachments: D13797.1.patch, example.patch, HIVE-5657.1.patch.txt
> Attached patch illustrates the problem.
> limit_pushdown test has various other cases of aggregations and distincts, incl. count-distinct,
that work correctly (that said, src dataset is bad for testing these things because every
count, for example, produces one record only), so something must be special about this.
> I am not very familiar with distinct- code and these nuances; if someone knows a quick
fix feel free to take this, otherwise I will probably start looking next week. 

This message was sent by Atlassian JIRA

View raw message