hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Alternative distributed filesystem.
Date Fri, 13 Nov 2009 22:12:20 GMT
If you are looking for large distributed file system with posix locking look at:

glusterfs
lusterfs
ocfs2
redhat GFS

Edward
On Fri, Nov 13, 2009 at 5:07 PM, Michael Thomas <thomas@hep.caltech.edu> wrote:
> Hi Dmitry,
>
> I still stand by my original statement.  We do use fuse_dfs for reading data
> on all of the worker nodes.  We don't use it much for writing data, but only
> because our project's data model was never designed to use a posix
> filesystem for writing data, only reading.
>
> --Mike
>
> On 11/13/2009 02:04 PM, Dmitry Pushkarev wrote:
>>
>> Mike,
>>
>> I guess what I said referred to use of fuse_hdfs as general solution. If
>> we
>> were to use native APIs that'd be perfect. But we basically need to mount
>> is
>> as a place where programs can simultaneously dump large amounts of data.
>>
>> -----Original Message-----
>> From: Michael Thomas [mailto:thomas@hep.caltech.edu]
>> Sent: Friday, November 13, 2009 2:00 PM
>> To: common-user@hadoop.apache.org
>> Subject: Re: Alternative distributed filesystem.
>>
>> On 11/13/2009 01:56 PM, Dmitry Pushkarev wrote:
>>>
>>> Dear Hadoop users,
>>>
>>>
>>>
>>> One of our hadoop clusters is being converted to SGE to run very specific
>>> application and we're thinking about how to utilize these huge
>>> hard-drives
>>> that are there. Since there will be no hadoop installed on these nodes
>>
>> we're
>>>
>>> looking for alternative distributed filesystem that will have decent
>>> concurrent read/write performance (compared to HDFS) for large amounts of
>>> data. Using single filestorage - like NAS RAID arrays proved to be very
>>> ineffective when someone is pushing gigabytes of data on them.
>>>
>>>
>>>
>>> What other systems can we look at? We would like that FS to be mounted on
>>> every node, open source, hopefully we'd like to have POSIX compliance and
>>> decent random access performance (yet it isn't critical).
>>>
>>> HDFS doesn't fit the bill because mounting it via fuse_dfs and using
>>
>> without
>>>
>>> any mapred jobs (i.e. data will typically be pushed from 1-2 nodes at
>>> most
>>> at different times) seems slightly "ass-backward" to say the least.
>>
>> I would hardly call is ass-backwards.  I know of at least 3 HPC clusters
>> that use only the HDFS component of Hadoop to serve 500TB+ of data to
>> 100+ worker nodes.
>>
>> As a cluster filesystem, HDFS works pretty darn well.
>>
>> --Mike
>>
>>
>>
>
>
>

Mime
View raw message