hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Per Steffensen <st...@designware.dk>
Subject Re: HDFS vs software RAID like md(adm)
Date Thu, 15 Sep 2011 07:30:03 GMT
Norman Maurer skrev:
> You should keep in mind that HDFS is not POSIX conform so you will
> have a hard time to use it as "real fs". I know there is a fuse driver
Guess there is a few solutions http://wiki.apache.org/hadoop/MountableHDFS
An alternative would be to write the file-accessing code directly 
against the HDFS filesystem og perhaps against another VFS 
(http://en.wikipedia.org/wiki/Virtual_file_system), than what mounting 
gives us through the FUSE VFS 
(http://en.wikipedia.org/wiki/Filesystem_in_Userspace) - of course a VFS 
that has a port to HDFS (e.g. this 
(https://issues.apache.org/jira/browse/HDFS-1213) port to the Apache 
Commons VFS (http://commons.apache.org/vfs/))
> for it but I would not use it for heavy usage.
Ok, thanks. It will be used for heavy usage. A good cons.
>  Also HDFS is not really
> a good fit for random access at all.
Also a good cons.
> If you really need a POSIX fs I would recomment you to have a look at
> DRBD or glusterfs..
Thanks. I will have a look at those.
> Bye,
> Norman
> 2011/9/15 Per Steffensen <steff@designware.dk>:
>> David Rosenstrauch skrev:
>>> On 09/14/2011 02:02 PM, Per Steffensen wrote:
>>>> Hi
>>>> If my goal is to have multiple physical disks seem as one big disk with
>>>> redundancy built in, why would I use a HDFS cluster among machines with
>>>> one disk each, instead of using software RAID like md(adm) directly on
>>>> top of the disks? I am looking for pros and cons on the two solutions.
>>>> http://en.wikipedia.org/wiki/RAID#Software-based_RAID
>>>> http://en.wikipedia.org/wiki/Mdadm
>>>> Regards, Per Steffensen
>>> HDFS was never intended to be a general-purpose file system.  It is a
>>> system optimized for a) running map/reduce, and b) holding large files.  It
>>> should not be considered as a replacement for RAID.
>>> DR
>> Thanks for you reply, David. Despite that HDFS wasnt intended to be used for
>> this, I guess it could be. So if we forget for a moment that it was not
>> designed/optimized to be used as a general purpose file system (GPFS), what
>> are the pros and cons for using it as a GPFS with built in redundancy vs
>> using software RAID. Is HDFS too slow for some kind of file operations, or
>> what will the problems (and benefits) be? Hope for some input - I need
>> arguments for and against to be used in a discussion with a customer.
>> Thanks!

View raw message