hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang" <hair...@yahoo-inc.com>
Subject RE: rack-awareness for hdfs
Date Tue, 18 Sep 2007 19:12:25 GMT
>   1. The network topology is currently hdfs centric and needs to be
generalized. There is a jira for this. 
This is no longer an issue. Hadoop-1266 has a patch that's committed.

Hairong
-----Original Message-----
From: Owen O'Malley [mailto:oom@yahoo-inc.com] 
Sent: Tuesday, September 18, 2007 9:38 AM
To: hadoop-user@lucene.apache.org
Subject: Re: rack-awareness for hdfs

On Sep 18, 2007, at 9:28 AM, Ted Dunning wrote:

>
> The key here is that the task farm need not coincide exactly with the 
> storage farm.

On a large run with an identical hdfs/mapreduce cluster, we see very high
(95%) mapper locality. However, it is usual case that the hdfs cluster is
larger than the map/reduce cluster and so it would be good to make the map
placement rack-aware and that is a recognized goal.

There are a couple of issues with the goal:
   1. The network topology is currently hdfs centric and needs to be
generalized. There is a jira for this.
   2. The filesystem interface needs to provide rack and node placement
information.
   3. The input split interface needs to be generalized to deal with racks
as well as nodes.
   4. The job tracker needs to use the rack information to utilize the rack
information.

It is not on my short term radar, but it is on the medium term radar.  
However, patches are welcome! *smile*

-- Owen


Mime
View raw message