hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philip Tromans <philip.j.trom...@gmail.com>
Subject Re: unexplode?
Date Thu, 23 Aug 2012 17:26:06 GMT
insert into originalTable
select uniqueId, collect_set(whatever) from explodedTable group by uniqueId

will probably do the trick.

Phil.

On 23 August 2012 17:45, Mike Fleming <mike@obvious.com> wrote:
> I see that hive has away to take a table and produce multiple rows.
>
> Is there a built in way to do the reverse?
>
> Say I have a table with a unique key and an array. I do this:
>
>> insert into explodedTable select uniqueId, explode(arrayOfThings) from
>> originalTable
>
> Now I have a table with a row for each (uniqueId, element in arrayOfThings).
>
> Is there any way to take the contents of explodedTable and essentially
> produce the original table, reconstructing the arrayOfThings for each
> uniqueId?
>
> It seems, conceptually, that if I "cluster by uniqueId" then a reducer knows
> that it will get all rows for each uniqueId bundled together, so it ought to
> be fairly feasible to simply emit an unexploded row. However, I can't seem
> to find a built-in way to do this.
>
> Mike
>

Mime
View raw message