hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nigel Daley <nda...@yahoo-inc.com>
Subject Re: Task allocation to TaskTrackers
Date Wed, 14 Feb 2007 18:05:38 GMT
Hi Vasiliy :)

> I have a question regarding task allocation to TaskTrackers (could  
> not find an answer in the docs). When a MapReduce job is run, does  
> the system attempt to schedule a Map task on a machine that  
> contains a replica of the task's input data, or not?

Yes, the JobTracker attempts to schedule the map on a node containing  
that map's input split.

> If yes, how does the system know which TaskTracker corresponds to  
> which DataNode (by IP  address, by host name, or by something else)?

See InputSlit.getLocations() (http://svn.apache.org/viewvc/lucene/ 
hadoop/trunk/src/java/org/apache/hadoop/mapred/InputSplit.java? 
view=markup).  Currently, host names are used, but I believe it's  
moving to IP address (see https://issues.apache.org/jira/browse/ 
HADOOP-985).

> Also, what happens if that fails?

The task is schedule elsewhere.  However, now that DataNodes are  
aware of the rack they are on (as of 0.11.0), the JobTracker needs to  
be modified so that its fallback is to attempt to locate the map on a  
node "close" (same rack) as its data.

Cheers,
Nige

Mime
View raw message