hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Vyas <jayunit...@gmail.com>
Subject Re: Multi-stage map/reduce jobs
Date Fri, 23 Nov 2012 22:50:08 GMT
Hadoop is not an API for orchestrating mapreduce jobs- fortunately, there is no need for such
an API.  Each mapreduce job can simple be run like a normal java class.

So, to run multiple mapreduce jobs?

Easy- you create a main()[] method in a single class which runs each job individually by invoking
each job separately, using the waitForCompletion() method which blocks until a job completes.

..this method will block until each individual job completes.

Jay Vyas 

On Nov 23, 2012, at 5:22 PM, Sean McNamara <Sean.McNamara@Webtrends.com> wrote:

> It's not clear to me how to stitch together multiple map reduce jobs.  Without using
cascading or something else like it, is the method basically to write to a intermediate spot,
and have the next stage read from there?
> If so, how are jobs responsible for cleaning up the temp/intermediate data they create?
 What happens if stage 1 completes, and state 2 doesn't, do the stage 1 files get left around?
> Does anyone have some insight they could share?
> Thanks.

View raw message