hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Runping Qi <runp...@yahoo-inc.com>
Subject Re: RAID vs. JBOD
Date Thu, 15 Jan 2009 15:28:05 GMT

Yes, all the machines in the tests are new, with the same spec.
The 30% to 50% throughput variations of the disks were observed on the disks
of the same machines.


On 1/15/09 2:41 AM, "Steve Loughran" <stevel@apache.org> wrote:

> Runping Qi wrote:
>> Hi,
>> We at Yahoo did some Hadoop benchmarking experiments on clusters with JBOD
>> and RAID0. We found that under heavy loads (such as gridmix), JBOD cluster
>> performed better.
>> Gridmix tests:
>> Load: gridmix2
>> Cluster size: 190 nodes
>> Test results:
>> RAID0: 75 minutes
>> JBOD:  67 minutes
>> Difference: 10%
>> Tests on HDFS writes performances
>> We ran map only jobs writing data to dfs concurrently on different clusters.
>> The overall dfs write throughputs on the jbod cluster are 30% (with a 58
>> nodes cluster) and 50% (with an 18 nodes cluster) better than that on the
>> raid0 cluster, respectively.
>> To understand why, we did some file level benchmarking on both clusters.
>> We found that the file write throughput on a JBOD machine is 30% higher than
>> that on a comparable machine with RAID0. This performance difference may be
>> explained by the fact that the throughputs of different disks can vary 30%
>> to 50%. With such variations, the overall throughput of a raid0 system may
>> be bottlenecked by the slowest disk.
>> -- Runping
> This is really interesting. Thank you for sharing these results!
> Presumably the servers were all set up with "nominally" homogenous
> hardware? And yet still the variations existed. That would be something
> to experiment with on new versus old clusters to see if it gets worse
> over time.
> Here we have a batch of desktop workstations all bought at the same
> time, to the same spec, but one of them, "lucky" is more prone to race
> conditions than any of the others. We don't know why, and assume its do
> with the (multiple) Xeon CPU chips being at different ends of the bell
> curve or something. all we know is: test on that box before shipping to
> find race conditions early.
> -steve

View raw message