hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Is there a way to re-use the output of job which was killed?
Date Fri, 25 Nov 2011 01:58:11 GMT

This should be possible. One way is:

Your custom RecordReader initializations would need to check if a file exists before it tries
to create one, and upon existence it needs to simply pass through with 0 records to map(…)
-- thereby satisfying what you want to do.

You may also want to remove away output directory existence checks from your subclassed FileOutputFormat
(Override #checkOutputSpecs).

On 25-Nov-2011, at 5:24 AM, Samir Eljazovic wrote:

> Hi all,
> I was wandering if there is a off-the-shelf solution to re-use the output of the job
which was killed when re-running the job?
> Here's my use-case: Job (with map phase only) is running and has 60% of its work completed
before it gets killed. Output files from successfully completed tasks will be created in specified
output directory. The next time when I re-run this job using same input data I would like
to re-use those files to skip processing data which was already processed.
> Do you know if something similar exists and what would be right way to do it?
> Thanks,
> Samir

View raw message