hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amar Kamat <ama...@yahoo-inc.com>
Subject Re: Data-local tasks
Date Tue, 01 Jul 2008 04:39:39 GMT
Saptarshi Guha wrote:
> Hello,
> I recall asking this question but this is in addition to what I'ev askd.
> Firstly, to recap my question and Arun's specific response:
> -- On May 20, 2008, at 9:03 AM, Saptarshi Guha wrote: > Hello, > 
> -- Does the "Data-local map tasks" counter mean the number of tasks 
>  that the had the input data already present on the machine on they 
>  are running on? 
> -- i.e the wasn't a need to ship the data to them. 
> Response from Arun
> -- Yes. Your understanding is correct. More specifically it means that 
> the map-task got scheduled on a machine on which one of the 
> -- replicas of it's input-split-block was present and was served by 
> the datanode running on that machine. *smile* Arun
> Now, Is Hadoop designed to schedule a map task on a machine which has 
> one of the replicas of it's input split block?
> Failing that, does then assign a map task on machine close to one that 
> contains a replica of it's input split block?
The scheduling is tasktracker based rather than split based. By that 
what I mean is that the tasktracker asks for a task and the JT schedules 
a task to that tracker.
If there is any split that is data local to the tasktracker and not yet 
scheduled, it will be assigned to the tracker. If no such split can be 
found the JT will assign a high priority split to it. The priority 
amongst the splits is based on their ordering given by the jobclient. By 
default its sorted on split size (decreasing order). Either the split is 
data-local (on the same machine), rack local (within the same rack) or 
is not-local. There is no other measure of closeness. The scheduling 
problem is 'given a tasktracker find out the best split' rather than 
'given a split find out the best/closest tracker'.
> Are there any performance metrics for this?
> Many thanks
> Saptarshi
> */Saptarshi Guha | saptarshi.guha@gmail.com 
> <mailto:saptarshi.guha@gmail.com> | http://www.stat.purdue.edu/~sguha 
> <http://www.stat.purdue.edu/%7Esguha>/*

View raw message