hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@yahoo-inc.com>
Subject Re: A reporter thread during the reduce stage for a long running line
Date Fri, 09 Jan 2009 08:16:04 GMT

On Jan 9, 2009, at 12:09 AM, Saptarshi Guha wrote:

> Hello,
>  Sorry for the puzzling subject. I have a single long running
> /statement/ in my reduce method, so the the framework might assume my
> reduce is not responding and kill it.
> I solved the problem in the map method by subclassing MapRunner, and
> running a thread which calls reporter.progress() every minute or so.
> However the same thread does not run during the reduce (i checked this
> by setting a status string in the thread which  did not appear (on the
> Jobtracker website) during the reduce stage but did appearing during
> the map stage).
> Hadoop v.0.20 appears to solve this by having separate run methods for
> both Map and Reduce, however I'm using v 0.19.
> I scanned the Streaming source and it only subclasses MapRunner, so I
> assume it to has the same limitation (probably wrong, if so can
> someone point me to the location?)
> Is there a way around this, /without/ starting a thread in the  
> reduce function?
> Hadoop v 0.19

If you are _really_ sure you do not want the reducer to get killed  
regardless of whether it's making progress or not, set  
mapred.task.timeout to 0 in your JobConf.

Please be aware that this will mean the framework cannot detect if  
your mapper/reducer tasks are hung...


View raw message