hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Per Steffensen <st...@designware.dk>
Subject Re: HDFS vs software RAID like md(adm)
Date Thu, 15 Sep 2011 06:47:28 GMT
David Rosenstrauch skrev:
> On 09/14/2011 02:02 PM, Per Steffensen wrote:
>> Hi
>> If my goal is to have multiple physical disks seem as one big disk with
>> redundancy built in, why would I use a HDFS cluster among machines with
>> one disk each, instead of using software RAID like md(adm) directly on
>> top of the disks? I am looking for pros and cons on the two solutions.
>> http://en.wikipedia.org/wiki/RAID#Software-based_RAID
>> http://en.wikipedia.org/wiki/Mdadm
>> Regards, Per Steffensen
> HDFS was never intended to be a general-purpose file system.  It is a 
> system optimized for a) running map/reduce, and b) holding large 
> files.  It should not be considered as a replacement for RAID.
> DR
Thanks for you reply, David. Despite that HDFS wasnt intended to be used 
for this, I guess it could be. So if we forget for a moment that it was 
not designed/optimized to be used as a general purpose file system 
(GPFS), what are the pros and cons for using it as a GPFS with built in 
redundancy vs using software RAID. Is HDFS too slow for some kind of 
file operations, or what will the problems (and benefits) be? Hope for 
some input - I need arguments for and against to be used in a discussion 
with a customer. Thanks!

View raw message