hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deyaa Adranale <deyaa.adran...@iais.fraunhofer.de>
Subject Re: How to chain multiple hadoop jobs?
Date Thu, 10 Jul 2008 08:31:28 GMT
I have checked the code JobControl, it submits a set of jobs 
asyncronously and provide methods for checking their status, suspending 
them, and so on.

i think what Mori means by chaining jobs is to execute them after each 
other, so this class might not help him
i have run chained jobs  like Mori's code (even with a foor lop and a 
call to runJob inside it). In my case, I can't use the JobControl, 
because every job needs information from the output of the previous job, 
so they have to be chained.
Till now, I have never encountered problems when running chained jobs, 
although I have not tested it with datasets larger than few hundered KBs.

hope this helps,

Lukas Vlcek wrote:
> Hi,
> May be you should try to look at JobControl (see TestJobControl.java for
> particular example).
> Regards,
> Lukas
> On Wed, Jul 9, 2008 at 10:28 PM, Mori Bellamy <mbellamy@apple.com> wrote:
>> Hey all,
>> I'm trying to chain multiple mapreduce jobs together to accomplish a
>> complex task. I believe that the way to do it is as follows:
>> JobConf conf = new JobConf(getConf(), MyClass.class);
>> //configure job.... set mappers, reducers, etc
>> SequenceFileOutputFormat.setOutputPath(conf,myPath1);
>> JobClient.runJob(conf);
>> //new job
>> JobConf conf2 = new JobConf(getConf(),MyClass.class)
>> SequenceFileInputFormat.setInputPath(conf,myPath1);
>> //more configuration...
>> JobClient.runJob(conf2)
>> Is this the canonical way to chain jobs? I'm having some trouble with this
>> method -- for especially long jobs, the latter MR tasks sometimes do not
>> start up.

View raw message