hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "DFS requirements" by KonstantinShvachko
Date Sat, 15 Jul 2006 01:36:36 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by KonstantinShvachko:

     * ~-On our current installation there is 32TB of data, using 55,000 files and folders.
Scaling 32TB to 10PB under the assumption the average file size remains the same gives us
an estimate of 18,000,000 files.-~
   * Number of concurrent clients – 100 thousand.
     * ~-If on a 10,000 cluster each node has one task tracker running 4 tasks each according
to current m/r defaults then we need to support 40,000 simultaneous clients.-~ 
+  * Acceptable level of data loss – 1 hour.
+    * ~-Any data created or updated in DFS 1 hour ago or before is guaranteed to be recoverable
in case of system failures.-~ 
+  * Acceptable downtime level – 2 hours.
+    * ~-DFS failure requires manual system recovery. The system is guaranteed to be available
again not later than 2 hours after the recovery start.-~ 
  == Feature requirements: ==
@@ -102, +105 @@

   27. (Interoperability) '''Slim client''' design. The majority of the current client logic
should be placed into a data node, called “primary”, which controls the process of data
transfer to other nodes, lease extension and confirmation or failure reporting if required.
A thin client makes it easier to keep Java, C, etc. client implementations in sync. [[BR]]
''Currently the client plays the role of the primary node.''
   28. (Interoperability) Implement '''Web``Dav and NFS server''' mounting capabilities for
+ == Projects yet to be prioritized: ==
+  * Data nodes should store the per-block metadata in order to make possible a '''partial
metadata recovery''' in case the name node checkpoint information is lost.
+     a. The "original" file name or id, an invariant preserved during file renames, could
be stored in an additional file associated with each block, like the crc-file (see 15).
+     a. Block offset (the block sequence number) could be encoded as a part of the block
id, e.g., [[BR]] {{{<block id> = <random unique per file #><block sequence
+     a. Adding concurrent file append and truncate features will require a block generation
number to be stored as a part of the block file name.
+  * (Specification) Define '''recovery/failover and software upgrade procedures'''. 
+     a. The recovery of the cluster is manual; a document describing steps for the cluster
safe recovery after a name node failure is desired.
+     a. Based on the recovery procedures estimate the downtime of the cluster when the name
node fails.
+     a. A document is needed describing general procedures required to transition DFS from
one software version to another.
+  * Design a '''DFS backup scheme'''. [[BR]] ~-The backup is intended to prevent from data
loss related to file system software bugs, particularly during the system upgrades.-~ [[BR]]
~-The backup might not need to store the entire data set; some applications require just a
fraction of critical data so that the rest can be effectively restored.-~

View raw message