Mailing-List: contact common-commits-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-dev@hadoop.apache.org
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
From: Apache Wiki <wikidiffs@apache.org>
To: Apache Wiki <wikidiffs@apache.org>
Date: Wed, 27 Oct 2010 18:56:02 -0000
Message-ID: <20101027185602.54892.41732@eosnew.apache.org>
Subject: 
 =?utf-8?q?=5BHadoop_Wiki=5D_Update_of_=22HDFS-RAID=22_by_PatrickKling?=

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for ch=
ange notification.

The "HDFS-RAID" page has been changed by PatrickKling.
The comment on this change is: added sections on BlockFixer and RaidShell.
http://wiki.apache.org/hadoop/HDFS-RAID?action=3Ddiff&rev1=3D2&rev2=3D3

--------------------------------------------------

   * the DRFS client, which provides application access to the the files in=
 the DRFS and transparently recovers any corrupt or missing blocks encounte=
red when reading a file,
   * the RaidNode, a daemon that creates and maintains parity files for all=
 data files stored in the DRFS,
   * the BlockFixer, which periodically recomputes blocks that have been lo=
st or corrupted,
-  * the RaidFsck utility, which allows the administrator to manually trigg=
er the recomputation of missing or corrupt blocks and to check for files th=
at have become irrecoverably corrupted.
+  * the RaidShell utility, which allows the administrator to manually trig=
ger the recomputation of missing or corrupt blocks and to check for files t=
hat have become irrecoverably corrupted.
  =

  =3D=3D=3D DRFS client =3D=3D=3D
  =

@@ -33, +33 @@

  =

  It is important to note that while the DRFS client recomputes missing blo=
cks when reading corrupt files it does not
  insert these missing blocks back into the file system. Instead, it discar=
ds them once the application request has been fulfilled.
- The BlockFixer daemon and the RaidFsck tool can be used to persistently f=
ix bad blocks.
+ The BlockFixer daemon and the RaidShell tool can be used to persistently =
fix bad blocks.
  =

  =3D=3D=3D RaidNode =3D=3D=3D
  =

@@ -55, +55 @@

  =

  (currently under development)
  =

- The BlockFixer is a daemon that runs at the RaidNode
+ The BlockFixer is a daemon that runs at the RaidNode and periodically ins=
pects the health of the paths for which DRFS is configured.
+ When a file with missing or corrupt blocks is encountered, these blocks a=
re recomputed and inserted back into the file system.
  =

- =3D=3D=3D RaidFsck =3D=3D=3D
+ There are two implementations of the BlockFixer:
+  * the LocalBlockFixer, which recomputes bad blocks locally at the RaidNo=
de.
+  * the DistributedBlockFixer, which dispatches map reduce jobs to recompu=
te blocks.
+ =

+ =3D=3D=3D RaidShell =3D=3D=3D
  =

  (currently under development)
+ =

+ The RaidShell is a tool that allows the administrator to maintain and ins=
pect a DRFS. It supports commands for manually triggering the =

+ recomputation of bad data blocks and also allows the administrator to dis=
play a list of irrecoverable files (i.e., files for which too
+ many data or parity blocks have been lost).
+ =

  =

  =3D=3D Using HDFS RAID =3D=3D
  =

@@ -199, +209 @@

    </property
    }}}
  =

- =3D=3D=3D Administration =3D=3D=3D
+ =3D=3D=3D Running DRFS =3D=3D=3D
  =

  The DRFS  provides support for administration at runtime without
  any downtime to cluster services.  It is possible to add/delete new paths=
 to be raided without
- interrupting any load on the cluster. If you change raid.xml, its content=
s will be
- reload within seconds and the new contents will take effect immediately.
+ interrupting any load on the cluster. Changes to `raid.xml` are detected =
periodically (every few seconds)
+ and new policies are applied immediately.
  =

  Designate one machine in your cluster to run the RaidNode software. You c=
an run this daemon
  on any machine irrespective of whether that machine is running any other =
hadoop daemon or not.
  You can start the RaidNode by running the following on the selected machi=
ne:
+ {{{
  nohup $HADOOP_HOME/bin/hadoop org.apache.hadoop.raid.RaidNode >> /xxx/log=
s/hadoop-root-raidnode-hadoop.xxx.com.log &
+ }}}
  =

- Optionally, we provide two scripts to start and stop the RaidNode. Copy t=
he scripts
+ We also provide two scripts to start and stop the RaidNode more easily. C=
opy the scripts
- start-raidnode.sh and stop-raidnode.sh to the directory $HADOOP_HOME/bin =
in the machine
+ `start-raidnode.sh` and `stop-raidnode.sh` to the directory `$HADOOP_HOME=
/bin` on the machine
- you would like to deploy the daemon. You can start or stop the RaidNode b=
y directly
+ where the RaidNode is to be deployed. You can then start or stop the Raid=
Node by directly
- callying the scripts from that machine. If you want to deploy the RaidNod=
e remotely,
+ calling these scripts on that machine. To deploy the RaidNode remotely,
- copy start-raidnode-remote.sh and stop-raidnode-remote.sh to $HADOOP_HOME=
/bin at
+ copy `start-raidnode-remote.sh` and `stop-raidnode-remote.sh` to `$HADOOP=
_HOME/bin` at
  the machine from which you want to trigger the remote deployment and crea=
te a text
- file $HADOOP_HOME/conf/raidnode at the same machine containing the name o=
f the server
+ file `$HADOOP_HOME/conf/raidnode` on the same machine containing the name=
 of the machine
- where the RaidNode should run. These scripts run ssh to the specified mac=
hine and
+ where the RaidNode should be deployed. These scripts ssh to the specified=
 machine and
- invoke start/stop-raidnode.sh there. As an example, you might want to cha=
nge
+ invoke `start-raidnode.sh`/`stop-raidnode.sh` there.
+ =

+ For easy maintencance, you might want to change
- start-mapred.sh in the JobTracker machine so that it automatically calls
+ `start-mapred.sh` on the JobTracker machine so that it automatically calls
- start-raidnode-remote.sh (and do the equivalent thing for stop-mapred.sh =
and
+ `start-raidnode-remote.sh` (and make a similar change to`stop-mapred.sh` =
to call
- stop-raidnode-remote.sh).
+ `stop-raidnode-remote.sh`).
  =

+ To monitor the health of a DRFS, use the fsck command provided by the Rai=
dShell.
- Run fsckraid periodically (being developed as part of another JIRA). This=
 validates parity
- blocks of a file.
  =

  =

 =20