hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bejoy Ks <bejoy.had...@gmail.com>
Subject Re: Collecting MAP output in a Iterator
Date Tue, 21 Aug 2012 08:02:50 GMT
Hi Siddharth

To add on if one of your data sets/tables are small enough to be fit in
memory, you can distributed them in over Distribute cache and use it as a
look up while streaming the larger data to perform the join. With this you
can totally avoid a reduce phase in your join there by giving
a performance edge to your jobs.

Details on map join in hive :
https://cwiki.apache.org/Hive/joinoptimization.html
http://hive.apache.org/docs/r0.9.0/language_manual/joins.html

Regards
Bejoy KS

Mime
View raw message