hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Group By Concatenation
Date Fri, 02 Nov 2012 03:17:04 GMT
Collect_set() is built into hive. If you want a version that does not
de-duplicate look here.
https://github.com/edwardcapriolo/hive-collect

Caution both of these functions can produce out of memory if the
results are later then a mapper can store in memory.

On Thu, Nov 1, 2012 at 2:27 PM, Ratner, Alan S (IS) <Alan.Ratner@ngc.com> wrote:
> Sorry to ask what is probably a very naïve Hive question but here goes:
>
>
>
> I have a table as follows:
>
> Col1   Col2
>
> K1       V1
>
> K1       V1
>
> K2       V1
>
> K3       V1
>
> K1       V2
>
> K1       V3
>
> K2       V2
>
>
>
> Now I have managed to SELECT Col1,COUNT(DISTINCT Col2) FROM … BY COL1; to
> obtain
>
> K1       3
>
> K2       2
>
> K3       1
>
>
>
> But what I want is a concatenated list of all the distinct Col2 values for
> each Col1 key.
>
> K1       V1       V2       V3
>
> K2       V1       V2
>
> K3       V1
>
>
>
> This is something absolutely trivial in MR but I cannot seem to find
> anything in Hive that will do this for me.  Do I have to write a UDF to
> accomplish this?
>
>
>
>
>
> Alan
>
>

Mime
View raw message