hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashish Thusoo (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-707) add group_concat
Date Wed, 29 Jul 2009 18:22:14 GMT

    [ https://issues.apache.org/jira/browse/HIVE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736737#action_12736737
] 

Ashish Thusoo commented on HIVE-707:
------------------------------------

Found the JIRA...

Since group by is done in the reducer you could just use the trick that is used in 

distribute by x sort by y 

when we do MAP and REDUCE operators. By setting up reduce sink in a similar way you  would
be able to ensure that each reducer gets the rows for a value of x in the sorted order of
y. You can look at how we generate plans for the transform operator and use the same strategy
in group by code.

That should work and of course in this case we have to turn of any map/side aggregation?


> add group_concat
> ----------------
>
>                 Key: HIVE-707
>                 URL: https://issues.apache.org/jira/browse/HIVE-707
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Min Zhou
>
> Moving the discussion to a new jira:
> I've implemented group_cat() in a rush, and found something difficult to slove:
> 1. function group_cat() has a internal order by clause, currently, we can't implement
such an aggregation in hive.
> 2. when the strings will be group concated are too large, in another words, if data skew
appears, there is often not enough memory to store such a big result.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message