Return-Path: Delivered-To: apmail-hadoop-common-commits-archive@www.apache.org Received: (qmail 93033 invoked from network); 27 Oct 2010 18:56:40 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 27 Oct 2010 18:56:40 -0000 Received: (qmail 51808 invoked by uid 500); 27 Oct 2010 18:56:40 -0000 Delivered-To: apmail-hadoop-common-commits-archive@hadoop.apache.org Received: (qmail 51776 invoked by uid 500); 27 Oct 2010 18:56:40 -0000 Mailing-List: contact common-commits-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-commits@hadoop.apache.org Received: (qmail 51769 invoked by uid 500); 27 Oct 2010 18:56:40 -0000 Delivered-To: apmail-hadoop-core-commits@hadoop.apache.org Received: (qmail 51766 invoked by uid 99); 27 Oct 2010 18:56:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Oct 2010 18:56:40 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.131] (HELO eos.apache.org) (140.211.11.131) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Oct 2010 18:56:38 +0000 Received: from eosnew.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id 00C26186; Wed, 27 Oct 2010 18:56:03 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Apache Wiki To: Apache Wiki Date: Wed, 27 Oct 2010 18:56:02 -0000 Message-ID: <20101027185602.54892.41732@eosnew.apache.org> Subject: =?utf-8?q?=5BHadoop_Wiki=5D_Update_of_=22HDFS-RAID=22_by_PatrickKling?= Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for ch= ange notification. The "HDFS-RAID" page has been changed by PatrickKling. The comment on this change is: added sections on BlockFixer and RaidShell. http://wiki.apache.org/hadoop/HDFS-RAID?action=3Ddiff&rev1=3D2&rev2=3D3 -------------------------------------------------- * the DRFS client, which provides application access to the the files in= the DRFS and transparently recovers any corrupt or missing blocks encounte= red when reading a file, * the RaidNode, a daemon that creates and maintains parity files for all= data files stored in the DRFS, * the BlockFixer, which periodically recomputes blocks that have been lo= st or corrupted, - * the RaidFsck utility, which allows the administrator to manually trigg= er the recomputation of missing or corrupt blocks and to check for files th= at have become irrecoverably corrupted. + * the RaidShell utility, which allows the administrator to manually trig= ger the recomputation of missing or corrupt blocks and to check for files t= hat have become irrecoverably corrupted. = =3D=3D=3D DRFS client =3D=3D=3D = @@ -33, +33 @@ = It is important to note that while the DRFS client recomputes missing blo= cks when reading corrupt files it does not insert these missing blocks back into the file system. Instead, it discar= ds them once the application request has been fulfilled. - The BlockFixer daemon and the RaidFsck tool can be used to persistently f= ix bad blocks. + The BlockFixer daemon and the RaidShell tool can be used to persistently = fix bad blocks. = =3D=3D=3D RaidNode =3D=3D=3D = @@ -55, +55 @@ = (currently under development) = - The BlockFixer is a daemon that runs at the RaidNode + The BlockFixer is a daemon that runs at the RaidNode and periodically ins= pects the health of the paths for which DRFS is configured. + When a file with missing or corrupt blocks is encountered, these blocks a= re recomputed and inserted back into the file system. = - =3D=3D=3D RaidFsck =3D=3D=3D + There are two implementations of the BlockFixer: + * the LocalBlockFixer, which recomputes bad blocks locally at the RaidNo= de. + * the DistributedBlockFixer, which dispatches map reduce jobs to recompu= te blocks. + = + =3D=3D=3D RaidShell =3D=3D=3D = (currently under development) + = + The RaidShell is a tool that allows the administrator to maintain and ins= pect a DRFS. It supports commands for manually triggering the = + recomputation of bad data blocks and also allows the administrator to dis= play a list of irrecoverable files (i.e., files for which too + many data or parity blocks have been lost). + = = =3D=3D Using HDFS RAID =3D=3D = @@ -199, +209 @@ > /xxx/log= s/hadoop-root-raidnode-hadoop.xxx.com.log & + }}} = - Optionally, we provide two scripts to start and stop the RaidNode. Copy t= he scripts + We also provide two scripts to start and stop the RaidNode more easily. C= opy the scripts - start-raidnode.sh and stop-raidnode.sh to the directory $HADOOP_HOME/bin = in the machine + `start-raidnode.sh` and `stop-raidnode.sh` to the directory `$HADOOP_HOME= /bin` on the machine - you would like to deploy the daemon. You can start or stop the RaidNode b= y directly + where the RaidNode is to be deployed. You can then start or stop the Raid= Node by directly - callying the scripts from that machine. If you want to deploy the RaidNod= e remotely, + calling these scripts on that machine. To deploy the RaidNode remotely, - copy start-raidnode-remote.sh and stop-raidnode-remote.sh to $HADOOP_HOME= /bin at + copy `start-raidnode-remote.sh` and `stop-raidnode-remote.sh` to `$HADOOP= _HOME/bin` at the machine from which you want to trigger the remote deployment and crea= te a text - file $HADOOP_HOME/conf/raidnode at the same machine containing the name o= f the server + file `$HADOOP_HOME/conf/raidnode` on the same machine containing the name= of the machine - where the RaidNode should run. These scripts run ssh to the specified mac= hine and + where the RaidNode should be deployed. These scripts ssh to the specified= machine and - invoke start/stop-raidnode.sh there. As an example, you might want to cha= nge + invoke `start-raidnode.sh`/`stop-raidnode.sh` there. + = + For easy maintencance, you might want to change - start-mapred.sh in the JobTracker machine so that it automatically calls + `start-mapred.sh` on the JobTracker machine so that it automatically calls - start-raidnode-remote.sh (and do the equivalent thing for stop-mapred.sh = and + `start-raidnode-remote.sh` (and make a similar change to`stop-mapred.sh` = to call - stop-raidnode-remote.sh). + `stop-raidnode-remote.sh`). = + To monitor the health of a DRFS, use the fsck command provided by the Rai= dShell. - Run fsckraid periodically (being developed as part of another JIRA). This= validates parity - blocks of a file. = = =20