hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bejoy Ks <bejoy.had...@gmail.com>
Subject Re: What is the right way to do map-side joins in Hadoop 1.0?
Date Sun, 15 Jan 2012 12:48:54 GMT
Hi Mark
           Have a look at CompositeInputFormat. I guess it is what you are
looking for to achieve map side joins. If you are fine with a Reduce side
join go in with MultipleInputFormat. I have tried the same sort of joins
using  MultipleInputFormat and have scribbled something on the same. Check
out if it'd be useful for you. (A very crude implementation :), you may
have better ways )
http://kickstarthadoop.blogspot.com/2011/09/joins-with-plain-map-reduce.html

Hope it helps!...

Regards
Bejoy.K.S

On Sun, Jan 15, 2012 at 4:34 PM, Mike Spreitzer <mspreitz@us.ibm.com> wrote:

> BTW, each key appears exactly once in the large constant dataset, and
> exactly once in each MR job's output.
>
> I am thinking the right approach is to consistently partition the job
> output and the large constant dataset, with the number of partitions being
> the number of reduce tasks; each part goes into its own file.  Make an
> InputFormat whose number of splits equals the number of reduce tasks.
>  Reading a split will consist of reading a corresponding pair of files,
> stepping through each.  Seems like something that should already be
> provided by something in org.apache.hadoop.mapreduce.*.
>
> Thanks,
> Mike

Mime
View raw message