hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Disks RAID best practice
Date Thu, 01 Nov 2012 14:49:10 GMT
Oleg, that's for an overall raid preference. 

Specifically for the 'control nodes' aka (NN, SN, JT, HM, ZK...) 

I tend to just use simple mirroring because these processes are not really I/O bound. (RAID-1).

I guess you could go RAID-10 (Stripe and Mirrored) but that may be a little overkill and my
preference comes from working in the RDBMS world. 

If we are using commodity servers, JBOD tends to be the preferred way of handling things.

However, I've seen cases where people will use RAIDed Drives on a node for a couple of reasons.
The nice thing about doing mirrored DN drives is that if you have a disk failure you just
pop the drive and replace it.  Much simpler.

If we're looking at using NetApp's E Series in conjunction with a compute cluster, then you
are using their raided configuration and can reduce the cluster's replication factor to 2
from 3. 

While its easy to recommend RAID on the control nodes, data nodes is a bit trickier.  I mean
you can run with straight JBOD and based on a cost issue, its the cheapest in terms of hardware.
 If you go with RAID on the DN, you reduce your storage density per node because you have
redundancy in hardware. And this has an impact on your overall machine density and TCO.  
This is offset by easier and faster recovery time from some hardware failure events. Lets
face it, the number one thing to fail is going to be your hard drives.  So we are going to
have to balance the costs against the benefits. 

Now I have to state the obvious caveats... 1) YMMV, 2) The factors which go in to the cluster
design decision are going to be unique to the company  setting up the cluster.  

These are IMHO, and you know what they say about opinions... ;-) 


On Nov 1, 2012, at 7:52 AM, Oleg Ruchovets <oruchovets@gmail.com> wrote:

> Do you mean RAID 10 for Master Node?
> What about DataNode?
> Thanks
> Oleg.
> On Thu, Nov 1, 2012 at 2:43 PM, Michael Segel <michael_segel@hotmail.com>wrote:
>> I prefer RAID 10, but some say RAID 6.
>> I thought NetApp used RAID 6 ?
>> Its definitely an interesting discussion point though.
>> -Mike
>> On Nov 1, 2012, at 7:37 AM, Oleg Ruchovets <oruchovets@gmail.com> wrote:
>>> Hi ,
>>>  What is the best practice for DISKS RAID  (Master and Data Nodes).
>>> Thanks in advance
>>> Oleg.

View raw message