hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: Cleanup after a Job
Date Tue, 01 May 2012 15:06:36 GMT
That really depends on the API that you are using.  In the newer API o.a.h.mapreduce.OutputFormat.getOutputCommitter
returns the output committer to use.  In the older API, which is the one that I expect you
are using JobConf.getOutputCommitter returns the output committer to use.  Be careful, because
by default you are probably using the FileOutputCommitter to put the files in the proper place
when your map/reduce job is done.  If you replace the FileOutputCommitter with something else
that does not do the same things your map/reduce jobs will stop working properly.  Typically
what you would want to do is to have your class inherit from FileOutputCommitter and then
in commitJob/abortJob call super.commitJob() or super.abortJob respectively.  Then do whatever
else you want to do.

--Bobby Evans

On 5/1/12 9:17 AM, "kasi subrahmanyam" <kasisubbu440@gmail.com> wrote:

Hi Robert,
Could you provied me the exact method of the JobControl Job or jobConf which calls the commitJob
method
Thanks

On Tue, May 1, 2012 at 7:36 PM, Robert Evans <evans@yahoo-inc.com> wrote:
Either abortJob or commitJob will be called for all jobs.  AbortJob will be called if the
job has failed.  CommitJob will be called if it succeeded.  The purpose of these are to commit
the output of the map/reduce job and cleanup any temporary files/data that might be lying
around.

CommitTask/abortTask is similar, and is called for each individual task.

--Bobby Evans



On 5/1/12 8:32 AM, "kasi subrahmanyam" <kasisubbu440@gmail.com <http://kasisubbu440@gmail.com>
> wrote:

Hi arun,

I can see that the output commiter is present in the reducer.
How to make sure thtat this commiter happens at the end of the job or does it run by default
at the end of the job.
I can have more than one reducer tasks.




On Sun, Apr 29, 2012 at 11:28 PM, Arun C Murthy <acm@hortonworks.com <http://acm@hortonworks.com>
> wrote:
Use OutputCommitter.(abortJob, commitJob):
http://hadoop.apache.org/common/docs/r1.0.2/api/org/apache/hadoop/mapred/OutputCommitter.html

Arun

On Apr 26, 2012, at 4:44 PM, kasi subrahmanyam wrote:

Hi

I have few jobs added to a Job controller .
I need a afterJob() to be executed after the completion of s Job.
For example

Here i am actually overriding the Job of JobControl.
I have Job2 depending on the output of Job1.This input for Job2is obtained after doing some
File System operations on the output of the Job1.This operation should happen in a afterJob(
) method while is available for each Job.How do i make sure that afterJob () method is called
for each Job added to the controller before running the jobs that are depending on it.


Thanks

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/







Mime
View raw message