hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@archive.org>
Subject Re: Disappearing TaskTrackers (Was: Pulsing TaskTrackers)
Date Sat, 04 Mar 2006 03:33:18 GMT
Doug Cutting wrote:
> stack wrote:
>> Is there a configurable timeout that says how long jobtrackers wait 
>> on communique from tasktrackers?
It looks like its my job thats the problem.  I've moved to new hardware 
and os and the /tmp dir is of a smaller size over-filling with temporary 
files as the job ran failing silently.  Now the tasktrackers stick around.

Upping the hardcoded timeout from 60seconds to ten minutes also helped.  
I see some tasks in jobtracker.jsp with times-since-last-communication 
north of 60 seconds that subsequently recover.  Perhaps I should add a 
patch that makes this configurable?

> I think I see what the problem is.  The job jar is copied out of dfs 
> to the local filesystem in the top-level loop of the tasktracker, not 
> in the TaskRunner, which runs as a separate thread.  This can cause 
> tasktrackers to time out.  So we should move that part of 
> localizeTask() into TaskRunner.run() to avoid this.
Perhaps.  My job jar is large.  Its nutch and then some.


> Also, it is rather confusing that there are two classes named 
> TaskInProgress, one nested in TaskTracker and one used by the 
> JobTracker...
> Doug

View raw message