hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: HDFS architecture based on GFS?
Date Fri, 27 Feb 2009 13:50:00 GMT
kang_min82 wrote:
> Hello Matei, 
> Which Tasktracker did you mean here ? 
> I don't understand that. In general we have mane Tasktrackers and each of
> them runs on one separate Datanode. Why doesn't the JobTracker talk directly
> to the Namenode for a list of Datanodes and then performs the MapReduce
> tasks there.

1. There's no requirement for a 1:1 mapping of task-trackers to 
datanodes. You could bring up TT's on any machine with spare CPU cycles 
on your network, talking to a long lived filesystem built from a few 

2. There's no requirement for HDFS. You could have a cluster of 
MapReduce nodes talking to other filesystems. Locality of data helps, 
but is not needed.

3. Layering makes for cleaner code.

View raw message