hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: unexplode?
Date Thu, 23 Aug 2012 17:31:28 GMT
Also I have a colllect udf.
https://github.com/edwardcapriolo/hive-collect

Since collect sets removes duplicates.

On Thu, Aug 23, 2012 at 1:26 PM, Philip Tromans
<philip.j.tromans@gmail.com> wrote:
> insert into originalTable
> select uniqueId, collect_set(whatever) from explodedTable group by uniqueId
>
> will probably do the trick.
>
> Phil.
>
> On 23 August 2012 17:45, Mike Fleming <mike@obvious.com> wrote:
>> I see that hive has away to take a table and produce multiple rows.
>>
>> Is there a built in way to do the reverse?
>>
>> Say I have a table with a unique key and an array. I do this:
>>
>>> insert into explodedTable select uniqueId, explode(arrayOfThings) from
>>> originalTable
>>
>> Now I have a table with a row for each (uniqueId, element in arrayOfThings).
>>
>> Is there any way to take the contents of explodedTable and essentially
>> produce the original table, reconstructing the arrayOfThings for each
>> uniqueId?
>>
>> It seems, conceptually, that if I "cluster by uniqueId" then a reducer knows
>> that it will get all rows for each uniqueId bundled together, so it ought to
>> be fairly feasible to simply emit an unexploded row. However, I can't seem
>> to find a built-in way to do this.
>>
>> Mike
>>

Mime
View raw message