hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: joins in map reduce
Date Wed, 21 May 2008 18:59:08 GMT


Also, if one source of the join is small enough to fit in memory, you can
build an in-memory table and do the map-side join on unsorted data.


On 5/21/08 11:43 AM, "Owen O'Malley" <oom@yahoo-inc.com> wrote:

> 
> On May 21, 2008, at 11:16 AM, Shirley Cohen wrote:
> 
>> How does one do a join operation in map reduce? Is there more than
>> one way to do a join? Which way works better and why?
> 
> There are a couple of ways, depending on what you need to do. If your
> input data is sorted and partitioned equivalently on the same key,
> you can do a join before the map (aka map-side join). The
> documentation is at:  http://tinyurl.com/5v4rot
> 
> If your data is not sorted and partitioned consistently, you need to
> do the join in the reduce. There is a library to help at: http://
> tinyurl.com/5cz669
> 
> -- Owen
> 
> 


Mime
View raw message