hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Sichi (JIRA)" <>
Subject [jira] Commented: (HIVE-287) count distinct on multiple columns does not work
Date Thu, 01 Jul 2010 02:07:52 GMT


John Sichi commented on HIVE-287:

Yes, if you can do get it to work in the grammar itself (via a choice in the existing "function"
rule), that would be best.  If it it's not possible to do it there for some reason, then semantic

Could you explain what you mean regarding impact?  Since Hive doesn't support COUNT(*) before
your patch, the only impact is on the rest of your patch, right?

BTW, here's the relevant BNF from my copy of the SQL:2003 standard (ISO/IEC 9075-2:2003 part
2 section 10.9), omitting some SQL/OLAP stuff such as <filter clause>:

<aggregate function > ::=
COUNT <left paren> <asterisk> <right paren>
| <general set function>

<general set function> ::=
<set function type> <left paren> [ <set quantifier> ] <value expression>
<right paren>

<set function type> ::=

<set quantifier> ::= DISTINCT | ALL

As you can see, they make a special case for COUNT to allow for star there alone, and they
don't allow COUNT(DISTINCT *).

> count distinct on multiple columns does not work
> ------------------------------------------------
>                 Key: HIVE-287
>                 URL:
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Arvind Prabhakar
>         Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, HIVE-287-4.patch
> The following query does not work:
> select count(distinct col1, col2) from Tbl

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message