hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "FrontPage" by SameerParanjpye
Date Tue, 22 Aug 2006 21:06:11 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by SameerParanjpye:
http://wiki.apache.org/lucene-hadoop/FrontPage

The comment on the change is:
Added HDFS intro to front page

------------------------------------------------------------------------------
  Having many map and reduce tasks enables good load balancing and allows failed tasks to
be
  re-run with small runtime overhead.
  
- == Architecture ==
+ === Architecture ===
  
  The Hadoop Map/Reduce framework has a master/slave architecture. It has a single master
  server or ''jobtracker'' and several slave servers or ''tasktrackers'', one per node in
the cluster.
@@ -54, +54 @@

  jobtracker and also handle data motion between the map and reduce phases.
  
  
+ == Hadoop DFS ==
+ 
+ Hadoop's Distributed File System is designed to reliably store very large files across
+ machines in a large cluster.  It is inspired by the
+ [http://labs.google.com/papers/gfs.html Google File System]. Hadoop DFS stores each file
+ as a sequence of blocks, all blocks in a file except the last block are the same size.
+ Blocks belonging to a file are replicated for fault tolerance. The block size and replication
+ factor are configurable per file. Files in HDFS are "write once" and have strictly one writer
+ at any time.
+ 
+ === Architecture ===
+ 
+ Like Hadoop Map/Reduce, HDFS follows a master/slave architecture. An HDFS installation
+ consists of a single ''Namenode'', a master server that manages the filesystem namespace
+ and regulates access to files by clients. In addition, there are a number of ''Datanodes'',
+ one per node in the cluster, which manage storage attached to the nodes that they run on.
+ The ''Namenode'' makes filesystem namespace operations like opening, closing, renaming etc.
+ of files and directories available via an RPC interface. It also determines the mapping
of
+ blocks to ''Datanodes''. The ''Datanodes'' are responsible for serving read and write
+ requests from filesystem clients, they also perform block creation, deletion and replication
+ upon instruction from the ''Namenode''.
  
  
  == General Information ==

Mime
View raw message