hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "DataNode" by SteveLoughran
Date Tue, 05 Aug 2008 09:20:36 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by SteveLoughran:
http://wiki.apache.org/hadoop/DataNode

The comment on the change is:
creating a page

New page:
A DataNode stores data in the [HadoopFileSystem]. A functional filesystem has more than one
DataNode, with data replicated across them.

On startup, a DataNode connects to the NameNode; spinning until that service comes up. It
then responds to requests from the NameNode for filesystem operations. 

Client applications can talk directly to a DataNode, once the NameNode has provided the location
of the data. Similarly, MapReduce operations farmed out to TaskTracker instances near a DataNode,
talk directly to the DataNode to access the files. TaskTracker instances can,  indeed should,
be deployed on the same servers that host DataNode instances, so that MapReduce operations
are performed close to the data.

DataNode instances can talk to each other, which is what they do when they are replicating
data. 

 * There is usually  no need to use RAID storage for DataNode data, because it is all designed
to be replicated across multiple servers, rather than multiple disks on the same server. 
 * An ideal configuration is for a server to have a DataNode, a TaskTracker, and then physical
disks  one TaskTracker slot per CPU. This will allow  every TaskTracker 100% of a CPU, and
separate disks to read and write data.
 * Avoid using ["NFS"] for data storage in production system.

Mime
View raw message