hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "DFS requirements" by KonstantinShvachko
Date Sat, 01 Jul 2006 01:02:29 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by KonstantinShvachko:
http://wiki.apache.org/lucene-hadoop/DFS_requirements

------------------------------------------------------------------------------
      a. List data node blocks and file names they belong to.
      a. Report the name node configuration parameters.
      a. History of data node failures, restarts, etc.
+  11. (Scalability) Nodes with '''multiple disks''' should maintain local disks data distribution
internally.
+  12. (Scalability) '''Select-based communication''' for the DFS name node.
-  11. (Functionality) Currently, if we want to remove x nodes from the DFS cluster, we need
to remove them at most two at a time, and wait until re-replication happens, and there's no
feedback on that. It would be good to specify a list of nodes to remove, have their data re-replicated
while they're still online, and get a confirmation on completion.
+  13. (Functionality) Currently, if we want to remove x nodes from the DFS cluster, we need
to remove them at most two at a time, and wait until re-replication happens, and there's no
feedback on that. It would be good to specify a list of nodes to remove, have their data re-replicated
while they're still online, and get a confirmation on completion.
-  12. (Scalability) Nodes with '''multiple disks''' should maintain local disks data distribution
internally.
-  13. (Scalability) '''Select-based communication''' for the DFS name node.
   14. (Specification) Define '''invariants for read and append''' commands. A formalization
of DFS consistency model with underlying assumptions and the result guarantees.
   15. (Performance) Check sum data should not be stored as a separate DFS '''crc-file''',
but rather maintained by a data node per locally stored block copy. This will reduce name
node operations and improve read data locality for maps.
      a. '''CRC scanning'''. We should dedicate up to 1% of the disk bandwidth on a data node
to reading back the blocks and validating their CRCs. The results should be logged and reported
in the DFS UI.

Mime
View raw message