hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "DFS requirements" by KonstantinShvachko
Date Tue, 15 Aug 2006 17:26:10 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by KonstantinShvachko:
http://wiki.apache.org/lucene-hadoop/DFS_requirements

------------------------------------------------------------------------------
  == List of projects: ==
  
   1. '''Re-factoring.''' Develop abstractions for DFS components with each component represented
by an interface, specifying its functionality and interaction with other components. With
good abstractions, it should be easy to add new features without compromising reliability.
The abstractions should be evaluated with required future features in mind. [[BR]] ~-For example,
data nodes might have a block transfer object, a block receive object, etc., with carefully
defined behavior, coordinated by a top-level control structure, instead of the morass of methods
in the data node at present.-~
-  2. (Reliability) '''Robust name node checkpointing''' and namespace edits logging. [[BR]]
''Currently the system is not restorable in case of name node hardware failure.'' [[BR]] DFS
should store “image” and “edits” files on a local name node disk and replicate them
on backup nodes using a simple streaming protocol.
+  2. (Reliability) '''Robust name node checkpointing''' and namespace edits logging. [[BR]]
''Currently the system is not restorable in case of name node hardware failure.'' [[BR]] DFS
should store “image” and “edits” files on a local name node disk and replicate them
on backup nodes using a simple streaming protocol. [[BR]][http://issues.apache.org/jira/browse/HADOOP-332
HADOOP-332] ''In progress''.
   3. (Reliability) Define the '''startup process''', what is done by each component, in which
order. Introduce a concept of '''“safe mode”''', which would not make any block replication/removal
decisions or change the state of the namespace in any way. Name node stays in safe mode until
a configurable number of nodes have been started and reported to the name node a configurable
percentage of data blocks. [[BR]][http://issues.apache.org/jira/browse/HADOOP-306 HADOOP-306],
[http://issues.apache.org/jira/browse/HADOOP-250 HADOOP-250] ''In progress''.
   4. (Reliability) The name node '''checkpoint should store a list of data nodes''' serving
distinct data storages that ever reported to the name node. Namely, the following is stored
for each data node in the cluster:  [[BR]] <host:port; storageID; time of last heartbeat;
user id>. [[BR]] Missing nodes should be reported in the DFS UI, and during the startup.
See also 3.a. [[BR]][http://issues.apache.org/jira/browse/HADOOP-306 HADOOP-306],  ''In progress''.
   5. (Reliability) Nodes with '''read only disks''' should report the problem to the name
node and shut themselves down if all their local disks are unavailable. [[BR]][http://issues.apache.org/jira/browse/HADOOP-163
HADOOP-163] __Done__.

Mime
View raw message