hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Reed <br...@yahoo-inc.com>
Subject Re: Multiple tasktrackers per node
Date Thu, 25 May 2006 16:19:37 GMT
My task_zoom.patch fixes "the 10 sec delay before getting another  
task when a task completes" bug. It is a rather minor part of the  
task_zoom.patch. Basically, the TaskTracker updates the JobTracker as  
soon as the task completes. There was another bug in the JobTracker  
that made it count all tasks rather than just the running tasks,  
which could cause a delay longer than 10 secs in some cases that the  
patch fixes.


On May 25, 2006, at 8:57 AM, Doug Cutting wrote:

> Gianlorenzo Thione wrote:
>> Thanks for the answer. So far I am still trying to understand how   
>> each tasktracker gets multiple map or reduce tasks to be executed   
>> simultaneously. I have run a simple job with 53 map tasks on 5  
>> nodes,  and at all times each node was executing a single task.  
>> Each cluster  node is a 4 core machine, so theoretically this was  
>> a 16-node cluster  and I feel that the resources were actually  
>> underutilized. Am I  missing something? Is there a parameter for a  
>> minimum number of tasks  to be executed in parallel (I found a  
>> parameter for setting a maximum  [which I set to 4])? If I run 4  
>> TaskTrackers per node then each node  gets a map task at the same  
>> time and execution seems overall much  faster.
> The task tracker can currently get starved for work when tasks  
> complete too quickly.  This is a bug that will hopefully be fixed  
> soon.  The problem is that the task tracker only polls for a new  
> task once per heartbeat (10 seconds).  Instead it should poll for  
> new tasks as soon as tasks complete.  As a short-term workaround  
> you can decrease the heartbeat interval to one second in  
> MRConstants.java.  With smaller clusters (< 100 machines) that  
> should not cause any problems.
> Doug

View raw message