hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ulul <had...@ulul.org>
Subject Hadoop and RAID 5
Date Wed, 01 Oct 2014 21:01:47 GMT
Dear hadoopers,

Has anyone been confronted to deploying a cluster in a traditional IT 
shop whose admins handle thousands of servers ?
They traditionally use SAN or NAS storage for app data, rely on RAID 1 
for system disks and in the few cases where internal disks are used, 
they configure them with RAID 5 provided by the internal HW controller.

Using a JBOD setup , as advised in each and every Hadoop doc I ever laid 
my hands on, means that each HDD failure will imply, on top of the 
physical replacement of the drive, that an admin performs at least an mkfs.
Added to the fact that these operations will become more frequent since 
more internal disks will be used, it can be perceived as an annoying 
disruption in industrial handling of numerous servers.

In Tom White's guide there is a discussion of RAID 0, stating that Yahoo 
benchmarks showed a 10% loss in performance so we can expect even worse 
perf with RAID 5 but I found no figures.

I also found an Hortonworks interview of StackIQ who provides software 
to automate such failure fix up. But it would be rather painful to go 
straight to another solution, contract and so on while starting with Hadoop.

Please share your experiences around RAID for redundancy (1, 5 or other) 
in Hadoop conf.

Thank you

View raw message