hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: Is SAN storage is a good option for Hadoop ?
Date Thu, 29 Sep 2011 13:06:59 GMT
On 29/09/11 13:28, Brian Bockelman wrote:
> On Sep 29, 2011, at 1:50 AM, praveenesh kumar wrote:
>> Hi,
>> I want to know can we use SAN storage for Hadoop cluster setup ?
>> If yes, what should be the best pratices ?
>> Is it a good way to do considering the fact "the underlining power of Hadoop
>> is co-locating the processing power (CPU) with the data storage and thus it
>> must be local storage to be effective".
>> *But also, is it better to say “local is better” in the situation where I
>> have a single local 5400 RPM IDE drive, which  would be dramatically slower
>> than SAN storage striped  across many drives spinning at 10k RPM and
>> accessed via fiber channel ?*
> Hi Praveenesh,
> Two things:
> 1) If the option is a single 5400 RPM IDE drive (you can still buy those?) versus high-end
SAN, the high-end SAN is going to win.  That's often false comparison: the question is often
"What can I buy for $50k?".  In that case (setting aside organizational politics), you can
buy more spindles in the "traditional" Hadoop setup than for the SAN.
>    - Also, if you're latency limited, you're likely working against yourself.  The best
thing I ever did for my organization was make our software work just as well with 100ms latency
as with 1ms latency.
> 2) As Paul pointed out, you have to ask yourself whether the SAN is shared or dedicated.
 Many SANs don't have the ability to strongly partition workloads between users..
> Brian

One more: SAN is a SPOF. [Gray05] includes the impact of a SAN outage on 
MS TerraServer, while [Jiang08] provides evidence that entry level 
FibreChannel storage is less reliable than SATA due to interconnects.

Anyone who criticises the NameNode for being a SPOF and relies on a SAN 
instead is missing something obvious.

[Gray05] Empirical Measurements of Disk Failure Rates and Error Rates
[Jiang08] Are disks the dominant contributor for storage failures?

View raw message