hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bejoy Ks <bejoy.had...@gmail.com>
Subject Re: Collecting MAP output in a Iterator
Date Tue, 21 Aug 2012 08:02:50 GMT
Hi Siddharth

To add on if one of your data sets/tables are small enough to be fit in
memory, you can distributed them in over Distribute cache and use it as a
look up while streaming the larger data to perform the join. With this you
can totally avoid a reduce phase in your join there by giving
a performance edge to your jobs.

Details on map join in hive :

Bejoy KS

View raw message