hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arko Provo Mukherjee <arkoprovomukher...@gmail.com>
Subject Re: output from one map reduce job as the input to another map reduce job?
Date Tue, 27 Sep 2011 19:29:40 GMT
Hi,

I am not sure how you can avoid the filesystem, however, I did it as follows:

// For Job 1
FileInputFormat.addInputPath(job1, new Path(args[0]));
FileOutputFormat.setOutputPath(job1, new Path(args[1]));

// For job 2
FileInputFormat.addInputPath(job2, new Path(args[1]));
FileOutputFormat.setOutputPath(job2, new Path(args[2]));

Assuming
args[0] --> Input to first mapper
args[1] --> Output of first reducer / Input to second mapper
args[2] --> Out of second reducer

Hope this helps!
Warm regards
Arko

On Tue, Sep 27, 2011 at 2:09 PM, Kevin Burton <burton@spinn3r.com> wrote:
> Is it possible to connect the output of one map reduce job so that it is the
> input to another map reduce job.
> Basically… then reduce() outputs a key, that will be passed to another map()
> function without having to store intermediate data to the filesystem.
> Kevin
>
> --
>
> Founder/CEO Spinn3r.com
>
> Location: San Francisco, CA
> Skype: burtonator
>
> Skype-in: (415) 871-0687
>

Mime
View raw message