hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "HDFS-RAID" by PatrickKling
Date Wed, 27 Oct 2010 18:56:02 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "HDFS-RAID" page has been changed by PatrickKling.
The comment on this change is: added sections on BlockFixer and RaidShell.
http://wiki.apache.org/hadoop/HDFS-RAID?action=diff&rev1=2&rev2=3

--------------------------------------------------

   * the DRFS client, which provides application access to the the files in the DRFS and transparently
recovers any corrupt or missing blocks encountered when reading a file,
   * the RaidNode, a daemon that creates and maintains parity files for all data files stored
in the DRFS,
   * the BlockFixer, which periodically recomputes blocks that have been lost or corrupted,
-  * the RaidFsck utility, which allows the administrator to manually trigger the recomputation
of missing or corrupt blocks and to check for files that have become irrecoverably corrupted.
+  * the RaidShell utility, which allows the administrator to manually trigger the recomputation
of missing or corrupt blocks and to check for files that have become irrecoverably corrupted.
  
  === DRFS client ===
  
@@ -33, +33 @@

  
  It is important to note that while the DRFS client recomputes missing blocks when reading
corrupt files it does not
  insert these missing blocks back into the file system. Instead, it discards them once the
application request has been fulfilled.
- The BlockFixer daemon and the RaidFsck tool can be used to persistently fix bad blocks.
+ The BlockFixer daemon and the RaidShell tool can be used to persistently fix bad blocks.
  
  === RaidNode ===
  
@@ -55, +55 @@

  
  (currently under development)
  
- The BlockFixer is a daemon that runs at the RaidNode
+ The BlockFixer is a daemon that runs at the RaidNode and periodically inspects the health
of the paths for which DRFS is configured.
+ When a file with missing or corrupt blocks is encountered, these blocks are recomputed and
inserted back into the file system.
  
- === RaidFsck ===
+ There are two implementations of the BlockFixer:
+  * the LocalBlockFixer, which recomputes bad blocks locally at the RaidNode.
+  * the DistributedBlockFixer, which dispatches map reduce jobs to recompute blocks.
+ 
+ === RaidShell ===
  
  (currently under development)
+ 
+ The RaidShell is a tool that allows the administrator to maintain and inspect a DRFS. It
supports commands for manually triggering the 
+ recomputation of bad data blocks and also allows the administrator to display a list of
irrecoverable files (i.e., files for which too
+ many data or parity blocks have been lost).
+ 
  
  == Using HDFS RAID ==
  
@@ -199, +209 @@

    </property
    }}}
  
- === Administration ===
+ === Running DRFS ===
  
  The DRFS  provides support for administration at runtime without
  any downtime to cluster services.  It is possible to add/delete new paths to be raided without
- interrupting any load on the cluster. If you change raid.xml, its contents will be
- reload within seconds and the new contents will take effect immediately.
+ interrupting any load on the cluster. Changes to `raid.xml` are detected periodically (every
few seconds)
+ and new policies are applied immediately.
  
  Designate one machine in your cluster to run the RaidNode software. You can run this daemon
  on any machine irrespective of whether that machine is running any other hadoop daemon or
not.
  You can start the RaidNode by running the following on the selected machine:
+ {{{
  nohup $HADOOP_HOME/bin/hadoop org.apache.hadoop.raid.RaidNode >> /xxx/logs/hadoop-root-raidnode-hadoop.xxx.com.log
&
+ }}}
  
- Optionally, we provide two scripts to start and stop the RaidNode. Copy the scripts
+ We also provide two scripts to start and stop the RaidNode more easily. Copy the scripts
- start-raidnode.sh and stop-raidnode.sh to the directory $HADOOP_HOME/bin in the machine
+ `start-raidnode.sh` and `stop-raidnode.sh` to the directory `$HADOOP_HOME/bin` on the machine
- you would like to deploy the daemon. You can start or stop the RaidNode by directly
+ where the RaidNode is to be deployed. You can then start or stop the RaidNode by directly
- callying the scripts from that machine. If you want to deploy the RaidNode remotely,
+ calling these scripts on that machine. To deploy the RaidNode remotely,
- copy start-raidnode-remote.sh and stop-raidnode-remote.sh to $HADOOP_HOME/bin at
+ copy `start-raidnode-remote.sh` and `stop-raidnode-remote.sh` to `$HADOOP_HOME/bin` at
  the machine from which you want to trigger the remote deployment and create a text
- file $HADOOP_HOME/conf/raidnode at the same machine containing the name of the server
+ file `$HADOOP_HOME/conf/raidnode` on the same machine containing the name of the machine
- where the RaidNode should run. These scripts run ssh to the specified machine and
+ where the RaidNode should be deployed. These scripts ssh to the specified machine and
- invoke start/stop-raidnode.sh there. As an example, you might want to change
+ invoke `start-raidnode.sh`/`stop-raidnode.sh` there.
+ 
+ For easy maintencance, you might want to change
- start-mapred.sh in the JobTracker machine so that it automatically calls
+ `start-mapred.sh` on the JobTracker machine so that it automatically calls
- start-raidnode-remote.sh (and do the equivalent thing for stop-mapred.sh and
+ `start-raidnode-remote.sh` (and make a similar change to`stop-mapred.sh` to call
- stop-raidnode-remote.sh).
+ `stop-raidnode-remote.sh`).
  
+ To monitor the health of a DRFS, use the fsck command provided by the RaidShell.
- Run fsckraid periodically (being developed as part of another JIRA). This validates parity
- blocks of a file.
  
  
  

Mime
View raw message