hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "heyongqiang" <heyongqi...@software.ict.ac.cn>
Subject Re: Data-local tasks
Date Tue, 01 Jul 2008 05:07:49 GMT
Hadoop does not implemented the clever task scheduler, when a data node heartbeat with the
namenode, and if the data node wants a job, simply get one for it.
The selection  does not consider the task's input file at all.

Best regards,
Yongqiang He

发件人: Saptarshi Guha
发送时间: 2008-06-30 21:12:24
收件人: core-user@hadoop.apache.org
主题: Data-local tasks

I recall asking this question but this is in addition to what I'ev askd.
Firstly, to recap my question and Arun's specific response:

-- On May 20, 2008, at 9:03 AM, Saptarshi Guha wrote: > Hello, >  
-- Does the "Data-local map tasks" counter mean the number of tasks  that the had the input
data already present on the machine on they  are running on? 
-- i.e the wasn't a need to ship the data to them.  

Response from Arun

-- Yes. Your understanding is correct. More specifically it means that the map-task got scheduled
on a machine on which one of the 
-- replicas of it's input-split-block was present and was served by the datanode running on
that machine. *smile* Arun

Now, Is Hadoop designed to schedule a map task on a machine which has one of the replicas
of it's input split block?

Failing that, does then assign a map task on machine close to one that contains a replica
of it's input split block?

Are there any performance metrics for this?

Many thanks


Saptarshi Guha | saptarshi.guha@gmail.com | http://www.stat.purdue.edu/~sguha
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message