hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pamecha, Abhishek" <apame...@x.com>
Subject HDFS on SAN
Date Tue, 16 Oct 2012 22:24:14 GMT
Hi
<not sure if my previous message made it as I just subscribed>

I have read scattered documentation across the net which mostly say HDFS doesn't go well with
SAN being used to store data. While some say, it is an emerging trend. I would love to know
if there have been any tests performed which hint on what aspects does a direct storage excels/falls
behind a SAN.

We are investigating whether a direct storage option is better than a SAN storage for a modest
cluster with data in 100 TBs in steady state. The SAN of course can support order of magnitude
more of iops we care about for now, but given it is a shared infrastructure and we may expand
our data size, it may not be an advantage in the future.

Another thing I am interested in: for MR jobs, where data locality is the key driver, how
does that span out when using a SAN instead of direct storage?

And of course on the subjective topics of availability and reliability on using a SAN for
data storage in HDFS, I would love to receive your views.

Thanks,
Abhishek



Mime
View raw message