hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: how to implement error thresholds in a map-reduce job ?
Date Tue, 15 Nov 2011 19:22:35 GMT
Mapred,

If you fail a task permanently upon encountering a bad situation, you basically end up failing
the job as well, automatically. By controlling the number of retries (say down to 1 or 2 from
4 default total attempts), you can also have it fail the job faster.

Is killing the job immediately a necessity? Why?

I s'pose you could call kill from within the mapper, but I've never seen that as necessary
in any situation so far. Whats wrong with letting the job auto-die as a result of a failing
task?

On 16-Nov-2011, at 12:38 AM, Mapred Learn wrote:

> Thanks David for a step-by-step response but this makes error threshold, a per mapper
threshold. Is there a way to make it per job so that all mappers share this value and increment
it as a shared counter ?
> 
>  
> On Tue, Nov 15, 2011 at 8:12 AM, David Rosenstrauch <darose@darose.net> wrote:
> On 11/14/2011 06:06 PM, Mapred Learn wrote:
> Hi,
> 
> I have a use  case where I want to pass a threshold value to a map-reduce
> job. For eg: error records=10.
> 
> I want map-reduce job to fail if total count of error_records in the job
> i.e. all mappers, is reached.
> 
> How can I implement this considering that each mapper would be processing
> some part of the input data ?
> 
> Thanks,
> -JJ
> 
> 1) Pass in the threshold value as configuration value of the M/R job. (i.e., job.getConfiguration().setInt("error_threshold",
10) )
> 
> 2) Make your mappers implement the Configurable interface.  This will ensure that every
mapper gets passed a copy of the config object.
> 
> 3) When you implement the setConf() method in your mapper (which Configurable will force
you to do), retrieve the threshold value from the config and save it in an instance variable
in the mapper.  (i.e., int errorThreshold = conf.getInt("error_threshold") )
> 
> 4) In the mapper, when an error record occurs, increment a counter and then check if
the counter value exceeds the threshold.  If so, throw an exception.  (e.g., if (++numErrors
>= errorThreshold) throw new RuntimeException("Too many errors") )
> 
> The exception will kill the mapper.  Hadoop will attempt to re-run it, but subsequent
attempts will also fail for the same reason, and eventually the entire job will fail.
> 
> HTH,
> 
> DR
> 


Mime
View raw message