hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Sichi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-287) count distinct on multiple columns does not work
Date Thu, 17 Jun 2010 20:59:26 GMT

    [ https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879929#action_12879929
] 

John Sichi commented on HIVE-287:
---------------------------------

Sorry to chime in late on this one, but I have one big question on this one:  can we instead
do it in a way which does not break the UDAF interface?

The existing patch adds a new method to the GenericUDAFResolver interface, meaning all existing
plugin implementations outside of the Hive codebase will fail to compile (due to the fact
that we did not already have the insulating abstract base class available).  We already have
some of these within Facebook.

Let's analyze the two new parameters one by one.

isDistinct:  this doesn't actually modify the choice of evaluator implementation at all, since
the actual duplicate elimination takes place upstream of the UDAF invocation.  So instead
of adding this parameter, can we instead add a new method supportsDistinct() on GenericUDAFEvaluator?
 Then call this after instantiating the new evaluator in order to carry out the additional
validation.

isAllColumns:  COUNT(*) is probably the only function which is ever even going to care about
this one.  Couldn't we just use an empty array of TypeInfo to indicate all columns?

Independent of the above, I think adding the insulating abstract base should still be done
now to make future transitions smoother when interface-breaking is absolutely required.  So
keep that part of the patch.


> count distinct on multiple columns does not work
> ------------------------------------------------
>
>                 Key: HIVE-287
>                 URL: https://issues.apache.org/jira/browse/HIVE-287
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Arvind Prabhakar
>         Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message