Return-Path: Delivered-To: apmail-hadoop-common-commits-archive@www.apache.org Received: (qmail 77246 invoked from network); 27 Oct 2010 05:45:46 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 27 Oct 2010 05:45:46 -0000 Received: (qmail 37485 invoked by uid 500); 27 Oct 2010 05:45:46 -0000 Delivered-To: apmail-hadoop-common-commits-archive@hadoop.apache.org Received: (qmail 37338 invoked by uid 500); 27 Oct 2010 05:45:45 -0000 Mailing-List: contact common-commits-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-commits@hadoop.apache.org Received: (qmail 37329 invoked by uid 500); 27 Oct 2010 05:45:45 -0000 Delivered-To: apmail-hadoop-core-commits@hadoop.apache.org Received: (qmail 37323 invoked by uid 99); 27 Oct 2010 05:45:44 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Oct 2010 05:45:44 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.131] (HELO eos.apache.org) (140.211.11.131) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Oct 2010 05:45:41 +0000 Received: from eosnew.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id B0126123; Wed, 27 Oct 2010 05:45:04 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Apache Wiki To: Apache Wiki Date: Wed, 27 Oct 2010 05:45:04 -0000 Message-ID: <20101027054504.59791.68508@eosnew.apache.org> Subject: =?utf-8?q?=5BHadoop_Wiki=5D_Update_of_=22HDFS-RAID=22_by_dhrubaBorthakur?= X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for ch= ange notification. The "HDFS-RAID" page has been changed by dhrubaBorthakur. http://wiki.apache.org/hadoop/HDFS-RAID -------------------------------------------------- New page: This HDFS RAID module implements a Distributed Raid File System. It is used= alongwith an instance of the Hadoop Distributed File System (HDFS). It can be used to provide better protection against data corruption. It can also be used to reduce the total storage requirements of HDFS. Distributed Raid File System consists of two main software components. The = first component is the RaidNode, a daemon that creates parity files from specified HDFS fil= es. The second component "raidfs" is a software that is layered over a HDFS cli= ent and it intercepts all calls that an application makes to the HDFS client. If HDFS = encounters corrupted data while reading a file, the raidfs client detects it; it uses = the relevant parity blocks to recover the corrupted data (if possible) and retu= rns the data to the application. The application is completely transparent to t= he fact that parity data was used to satisfy it's read request. The primary use of this feature is to save disk space for HDFS files. HDFS typically stores data in triplicate. The Distributed Raid File System can be configured in such a way that a set= of data blocks of a file are combined together to form one or more parity bloc= ks. This allows one to reduce the replication factor of a HDFS file from 3 to 2 while keeping the failure probabilty relatively same as before. This typica= lly results in saving 25% to 30% of storage space in a HDFS cluster. INSTALLING and CONFIGURING: The entire code is packaged in the form of a single jar file hadoop-*-raid.= jar. To use HDFS Raid, you need to put the above mentioned jar file on the CLASSPATH. The easiest way is to copy the hadoop-*-raid.jar from HADOOP_HOME/build/contrib/raid to HADOOP_HOME/lib. Alternatively you can modify HADOOP_CLASSPATH to include this jar, in conf/hadoop-env.sh. There is a single configuration file named raid.xml that describes the HDFS path(s) that you want to raid. A sample of this file can be found in sc/contrib/raid/conf/raid.xml. Please edit the entries in this file to list= the path(s) that you want to raid. Then, edit the hdfs-site.xml file for your installation to include a reference to this raid.xml. You can add the following to your hdfs-site.xml raid.config.file /mnt/hdfs/DFS/conf/raid.xml This is needed by the RaidNode Please add an entry to your hdfs-site.xml to enable hdfs clients to use the parity bits to recover corrupted data. fs.hdfs.impl org.apache.hadoop.dfs.DistributedRaidFileSystem The FileSystem for hdfs: uris. OPTIONAL CONFIGIURATION: The following properties can be set in hdfs-site.xml to further tune you co= nfiguration: Specifies the location where parity files are located. hdfs.raid.locations hdfs://newdfs.data:8000/raid The location for parity files. If this is is not defined, then defaults to /raid. Specify the parity stripe length hdfs.raid.stripeLength 10 The number of blocks in a file to be combined into a single raid parity block. The default value is 5. The lower the number the greater is the disk space you will save when you enable raid. Specify the size of HAR part-files raid.har.partfile.size 4294967296 The size of HAR part files that store raid parity files. The default is 4GB. The higher the number the fewer the number of files used to store the HAR archive. Specifies the block placement policy for raid dfs.block.replicator.classname org.apache.hadoop.hdfs.server.namenode.BlockPlacementPolicyRaid The name of the class which specifies how to place blocks in HDFS. The class BlockPlacementPolicyRaid will try to avoid co-located replicas of the same stripe. This will greatly reduce the probability of raid file corruption. Specify the pool for fair scheduler raid.mapred.fairscheduler.pool none The name of the fair scheduler pool to use. Specify which implementation of RaidNode to use. raid.classname org.apache.hadoop.raid.DistRaidNode Specify which implementation of RaidNode to use (class name). Specify the periodicy at which the RaidNode re-calculates (if necessary) the parity blocks raid.policy.rescan.interval 5000 Specify the periodicity in milliseconds after which all source paths are rescanned and parity blocks recomputed if necessary. By default, this value is 1 hour. By default, the DistributedRaidFileSystem assumes that the underlying f= ile system is the DistributedFileSystem. If you want to layer the Distribut= edRaidFileSystem over some other file system, then define a property named fs.raid.under= lyingfs.impl that specifies the name of the underlying class. For example, if you wa= nt to layer The DistributedRaidFileSystem over an instance of the NewFileSystem, th= en fs.raid.underlyingfs.impl org.apche.hadoop.new.NewFileSystem Specify the filesystem that is layered immediately b= elow the DistributedRaidFileSystem. By default, this value is DistributedF= ileSystem. ADMINISTRATION: The Distributed Raid File System provides support for administration at ru= ntime without any downtime to cluster services. It is possible to add/delete new paths t= o be raided without interrupting any load on the cluster. If you change raid.xml, its contents = will be reload within seconds and the new contents will take effect immediately. Designate one machine in your cluster to run the RaidNode software. You can= run this daemon on any machine irrespective of whether that machine is running any other ha= doop daemon or not. You can start the RaidNode by running the following on the selected machine: nohup $HADOOP_HOME/bin/hadoop org.apache.hadoop.raid.RaidNode >> /xxx/logs/= hadoop-root-raidnode-hadoop.xxx.com.log & Optionally, we provide two scripts to start and stop the RaidNode. Copy the= scripts start-raidnode.sh and stop-raidnode.sh to the directory $HADOOP_HOME/bin in= the machine you would like to deploy the daemon. You can start or stop the RaidNode by = directly callying the scripts from that machine. If you want to deploy the RaidNode = remotely, copy start-raidnode-remote.sh and stop-raidnode-remote.sh to $HADOOP_HOME/b= in at the machine from which you want to trigger the remote deployment and create= a text file $HADOOP_HOME/conf/raidnode at the same machine containing the name of = the server where the RaidNode should run. These scripts run ssh to the specified machi= ne and invoke start/stop-raidnode.sh there. As an example, you might want to change start-mapred.sh in the JobTracker machine so that it automatically calls start-raidnode-remote.sh (and do the equivalent thing for stop-mapred.sh and stop-raidnode-remote.sh). Run fsckraid periodically (being developed as part of another JIRA). This v= aludates parity blocks of a file. IMPLEMENTATION: The RaidNode periodically scans all the specified paths in the configuration file. For each path, it recursively scans all files that have more than 2 b= locks and that has not been modified during the last few hours (default is 24 hou= rs). It picks the specified number of blocks (as specified by the stripe size), from the file, generates a parity block by combining them and stores the results as another HDFS file in the specified destination directory. There is a one-to-one mapping between a HDFS file and its parity file. The RaidNode also periodically finds parity files that are orphaned and deletes them. The Distributed Raid FileSystem is layered over a DistributedFileSystem instance intercepts all calls that go into HDFS. HDFS throws a ChecksumExce= ption or a BlocMissingException when a file read encounters bad data. The layered Distributed Raid FileSystem catches these exceptions, locates the correspon= ding parity file, extract the original data from the parity files and feeds the extracted data back to the application in a completely transparent way. The layered Distributed Raid FileSystem does not fix the data-loss that it encounters while serving data. It merely make the application transparently use the parity blocks to re-create the original data. A command line tool "fsckraid" is currently under development that will fix the corrupted files by extracting the data from the associated parity files. An adminstrator can run "fsckraid" manually as and when needed. =20