hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joan <joan.monp...@gmail.com>
Subject Re: Chain multiple jobs
Date Fri, 11 Feb 2011 08:54:43 GMT
I don't understant why ControlledJob when adds depending jobs (job2) It
tries to load TextInputFormat with output job1, but It obviously that the
output job1 doesn't exist because It didn't run.

How I do it? How I can add this dependency?

Schema:

job1: input(file [exist HDFS]) --> output (file)
job2: input(job1's output) --> output(file)

How to indicate to ControlledJob or JobControl that It doesn't add new
dependency while job1 has not finished?

Thanks

Joan


2011/2/10 Joan <joan.monplet@gmail.com>

> Hi,
>
> I've two jobs and I'm trying to control them by ControlledJob.
>
> job2 depends on job1 and the job2's input is the job1's output so when i do
> this:
>
>           cjob1 = new ControlledJob(job1, null);
>
>           dependingJobs = new ArrayList<ControlledJob>();
>           dependingJobs.add(cjob1);
>
>           cjob2 = new ControlledJob(job2, dependingJobs);
>
>           JobControl theControl = new JobControl("name");
>
>           theControl.addJob(cjob1);
>           theControl.addJob(cjob2);
>
>           Thread theController = new Thread(theControl);
>           theController.start();
>
>
>
> I get the exception:
>
> Exception in thread "main" java.io.FileNotFoundException: File "output from
> job1" does not exist.
>
> Because when "cjob2 = new ControlledJob(job2, dependingJobs);" is being
> instanced the input of job2 doesn't exist.
>
> Can someone help me?
>
> Thanks
>
> Joan
>

Mime
View raw message