hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Namit Jain <>
Subject RE: Unions causing many scans of input - workaround?
Date Mon, 08 Nov 2010 05:55:21 GMT
Other option would be to create a wrapper script (not use either UDF or UDTF)

That script, in any language, can emit any number of output rows per input row.

Look at:

for details

From: Sonal Goyal []
Sent: Sunday, November 07, 2010 8:40 PM
Subject: Re: Unions causing many scans of input - workaround?

Hey Tim,

You have an interesting problem. Have you tried creating a UDTF for your case, so that you
can possibly emit more than one record for each row of your input?

Thanks and Regards,

Sonal Goyal | Founder and CEO | Nube Technologies LLP |

On Mon, Nov 8, 2010 at 2:31 AM, Tim Robertson <<>>
Hi all,

I am porting custom MR code to Hive and have written working UDFs
where I need them.  Is there a work around to having to do this in

select * from
   select name_id, toTileX(longitude,0) as x, toTileY(latitude,0) as
y, 0 as zoom, funct2(lontgitude, 0) as f2_x, funct2(latitude,0) as
f2_y, count (1) as count
   from table
   group by name_id, x, y, f2_x, f2_y


   select name_id, toTileX(longitude,1) as x, toTileY(latitude,1) as
y, 1 as zoom, funct2(lontgitude, 1) as f2_x, funct2(latitude,1) as
f2_y, count (1) as count
   from table
   group by name_id, x, y, f2_x, f2_y

  --- etc etc increasing in zoom

The issue being that this does many passes over the table, whereas
previously in my Map() I would just emit many times from the same
input record and then let it all group in the shuffle and sort.
I actually emit 184 times for an input record (23 zoom levels of
google maps, and 8 ways to derive the name_id) for a single record
which means 184 union statements - Is it possible in hive to force it
to emit many times from the source record in the stage-1 map?

(ahem) Does anyone know if Pig can do this if not in Hive?

I hope I have explained this well enough to make sense.

Thanks in advance,

View raw message