Mailing-List: contact hdfs-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of amp@opendns.com designates
 67.215.68.163 as permitted sender)
Message-ID: <4D519A79.5050704@opendns.com>
Date: Tue, 08 Feb 2011 11:33:13 -0800
From: Adam Phelps <amp@opendns.com>
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US;
 rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7
MIME-Version: 1.0
To: hdfs-user@hadoop.apache.org
Subject: Re: HDFS drive, partition best practice
References: <C975B145.7F2D%john.buchanan@infinitecampus.com>
 <1524C476-E32B-4006-8EE2-77CF0D0766B4@parad.net>
In-Reply-To: <1524C476-E32B-4006-8EE2-77CF0D0766B4@parad.net>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit

On 2/7/11 2:06 PM, Jonathan Disher wrote:
> Currently I have a 48 node cluster using Dell R710's with 12 disks - two
> 250GB SATA drives in RAID1 for OS, and ten 1TB SATA disks as a JBOD
> (mounted on /data/0 through /data/9) and listed separately in
> hdfs-site.xml. It works... mostly. The big issues you will encounter is
> losing a disk - the DataNode process will crash, and if you comment out
> the affected drive, when you replace it you will have 9 disks full to N%
> and one empty disk.

If DataNode is going down after a single disk failure then you probably 
haven't set dfs.datanode.failed.volumes.tolerated in hdfs-site.xml.  You 
can up that number to allow DataNode to tolerate dead drives.

- Adam