hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-474) Support for distinct selection on two or more columns
Date Thu, 07 Oct 2010 19:08:35 GMT

    [ https://issues.apache.org/jira/browse/HIVE-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919023#action_12919023
] 

Namit Jain commented on HIVE-474:
---------------------------------

Once HIVE-537 is committed, the general idea is as listed in the example in HIVE-537.


Say, the query is:

select a, count(distinct b), count(distinct c) from T group by a

and the data is:

a1   b1   c1
a1   b1   c2
a1   b2   c2
a1   b2   c1
a2   ...


Mapper will emit a union type:

a1  0:b1
a1  1:c1
a1  0:b1
a1  1:c2
a1  0:b2
a1  1:c2
a1  0:b2
a1  1:c1


Since the sort key is (a, union_tag, (b|c))

The data will come to the reducer in the following order: 

a1  0:b1
a1  0:b1
a1  0:b2
a1  0:b2
a1  1:c1
a1  1:c1
a1  1:c2
a1  1:c2

and then the reducer can stream the distincts

> Support for distinct selection on two or more columns
> -----------------------------------------------------
>
>                 Key: HIVE-474
>                 URL: https://issues.apache.org/jira/browse/HIVE-474
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Alexis Rondeau
>            Assignee: Amareshwari Sriramadasu
>         Attachments: hive-474.0.4.2rc.patch
>
>
> The ability to select distinct several, individual columns as by example: 
> select count(distinct user), count(distinct session) from actions;   
> Currently returns the following failure: 
> FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns not Supported
user

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message