hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Thomas <tho...@hep.caltech.edu>
Subject Re: Alternative distributed filesystem.
Date Fri, 13 Nov 2009 22:07:54 GMT
Hi Dmitry,

I still stand by my original statement.  We do use fuse_dfs for reading 
data on all of the worker nodes.  We don't use it much for writing data, 
but only because our project's data model was never designed to use a 
posix filesystem for writing data, only reading.

--Mike

On 11/13/2009 02:04 PM, Dmitry Pushkarev wrote:
> Mike,
>
> I guess what I said referred to use of fuse_hdfs as general solution. If we
> were to use native APIs that'd be perfect. But we basically need to mount is
> as a place where programs can simultaneously dump large amounts of data.
>
> -----Original Message-----
> From: Michael Thomas [mailto:thomas@hep.caltech.edu]
> Sent: Friday, November 13, 2009 2:00 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Alternative distributed filesystem.
>
> On 11/13/2009 01:56 PM, Dmitry Pushkarev wrote:
>> Dear Hadoop users,
>>
>>
>>
>> One of our hadoop clusters is being converted to SGE to run very specific
>> application and we're thinking about how to utilize these huge hard-drives
>> that are there. Since there will be no hadoop installed on these nodes
> we're
>> looking for alternative distributed filesystem that will have decent
>> concurrent read/write performance (compared to HDFS) for large amounts of
>> data. Using single filestorage - like NAS RAID arrays proved to be very
>> ineffective when someone is pushing gigabytes of data on them.
>>
>>
>>
>> What other systems can we look at? We would like that FS to be mounted on
>> every node, open source, hopefully we'd like to have POSIX compliance and
>> decent random access performance (yet it isn't critical).
>>
>> HDFS doesn't fit the bill because mounting it via fuse_dfs and using
> without
>> any mapred jobs (i.e. data will typically be pushed from 1-2 nodes at most
>> at different times) seems slightly "ass-backward" to say the least.
>
> I would hardly call is ass-backwards.  I know of at least 3 HPC clusters
> that use only the HDFS component of Hadoop to serve 500TB+ of data to
> 100+ worker nodes.
>
> As a cluster filesystem, HDFS works pretty darn well.
>
> --Mike
>
>
>



Mime
View raw message