hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Stack <st...@archive.org>
Subject Re: Hung job
Date Wed, 15 Mar 2006 16:19:12 GMT
I ran overnight with the patch submitted to this list yesterday that 
adds a LogFormatter.resetLoggedSevere.  Twice during the night the 
TaskTracker was restarted because map outputs failed checksum when 
reducer came in to pick up map output parts.  Each time TaskTracker came 
back up... eventually.  The interesting thing was that it took 9 and 12 
restarts respectively as TaskTracker would restart anew because we 
didn't have the map output an incoming reducer was asking for (I'm 
assuming the incoming reducer has not yet been updated by jobtracker of 
the new state of affairs).

This situation is a big improvement over how things used work but seems 
as though we should try and avoid the TaskTracker start/stop churn.  

1. Add a damper so TaskTracker keeps its head down a while so its not 
around when Reducer's come looking for missing map outputs, or
2. Not have map output file log severe if taskid map part being 
requested is not one the TaskTracker knows about.

Neither of the above is very pretty.  Any other suggestions?  Otherwise 
I'll look into a patch to do a variation on 2. above.


Doug Cutting wrote:
> stack wrote:
>> Yes. Sounds like right thing to do. Minor comments in the below. 
>> Meantime, let me try it.
> Great.  Please report on whether this works for you.
>> Should there be a 'throw e;' after TaskTracker.LOG.log above?
> Yes.  You're right, there should be.
> Cheers,
> Doug

View raw message