hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arvind Prabhakar (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HIVE-287) count distinct on multiple columns does not work
Date Thu, 20 May 2010 18:29:18 GMT

     [ https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Arvind Prabhakar updated HIVE-287:
----------------------------------

           Status: Patch Available  (was: Open)
    Fix Version/s: 0.6.0

*Summary*
This patch fixes the {{count()}} aggregate function to be consistent with SQL. Specifically:
* Provides support for {{SELECT count(*) FROM table}} queries, where it returns the total
number of rows of the table.
* Also extended the support for {{count()}} to include multiple expression list. {{count(DISTINCT
expr1, exp2,...)}} returns the number of non-NULL and different valued rows from the evaluated
expressions.

*Details*
* Modified HiveQL grammar to allow function invocation with a single * in place of parameter
list.
* Propagated the presence of * as parameter or specification of {{DISTINCT}} keyword in the
UDF resolver framework so that it can be used by UDFs that behave differently when these are
applicable.
* Modified the {{count()}} UDAF to support the same semantics of handling NULL values as regular
SQL.
* Added test case to specifically exercise the newly introduced semantics of the count UDAF.

*Testing*
Ran all tests. Noted only two failures (input20.q, input33.q) which were found to be failing
on the local trunk image as well.

If and when this patch is committed to the trunk, I will go ahead and update the Hive Wiki
with details and examples regarding the use of {{count()}} UDAF in various forms.


> count distinct on multiple columns does not work
> ------------------------------------------------
>
>                 Key: HIVE-287
>                 URL: https://issues.apache.org/jira/browse/HIVE-287
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Arvind Prabhakar
>             Fix For: 0.6.0
>
>         Attachments: HIVE-287-1.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message