hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Baldeschwieler <eri...@yahoo-inc.com>
Subject Re: Use HDFS as a long term storage solution?
Date Thu, 06 Sep 2007 04:54:17 GMT
We are very interested in ideas and patches to improve the systems  
stability.

This is very young software, but we are using it at very large scale  
and intend to keep enhancing it.  We currently have a 2000 node file  
system with 3TB raw storage per node and are supporting millions of  
files.

With careful shepherding we've had no data loss events this year (we  
only recently reach 2000 nodes).  The biggest risks right now are  
user errors, due to lack of basic access protections.  Some fixes for  
that are in the works.



On Sep 5, 2007, at 5:56 PM, Jeff Hammerbacher wrote:

> We have very similar plans for Hadoop to what C G quotes below, but  
> we've
> found the stability of HDFS to be quite troublesome.  We've  
> corrupted HDFS
> three different ways in a few weeks: 1) running jStack on the  
> Namenode; 2)
> loading lots of small files into HDFS, causing it to hang on a Map/ 
> Reduce
> job and subsequently display corruption on restart; 3) upgrading to  
> a newer
> version of Hadoop.  Thus we are very uncertain about treating HDFS  
> as a
> reliable long-term data store.
>
> That being said, we're excited about the opportunities created by  
> Hadoop so
> we're going to put some time into making it more reliable and  
> creating a
> utility to archive data out of HDFS for backup purposes.
>
> On 9/5/07, C G <parallelguy@yahoo.com> wrote:
> >
> > Our intention is to use HDFS as the core of a large "data  
> repository".  We
> > store "raw" data within HDFS on a more-or-less permanent basis, and
> > map/reduce it to produce load files for our data warehouse.  We  
> have other
> > plans as well all centered around storing data on a very long  
> term basis in
> > HDFS.  So you're in good company...
> >
> >   Our plan is for a 64T HDFS repository, with a replication  
> factor of 3
> > for a ~21T data space.
> >
> >   C G
> >
> >
> > Dongsheng Wang <phidecn@yahoo.com> wrote:
> >
> > We are looking at using HDFS as a long term storage solution. We  
> want to
> > use it to stored lots of files. The file could be big and small,  
> they are
> > images, videos etc... We only write the files once, and may read  
> them many
> > times. Sounds like it is perfect to use HDFS.
> >
> > The concern is that since it's been engineered to support  
> MapReduce there
> > may be fundamental assumptions that the data being stored by HDFS is
> > transient in nature. Obviously for our scalable storage solution  
> zero data
> > loss or corruption is a heavy requirement.
> >
> > Is anybody using HDFS as a long term storage solution? Interested  
> in any
> > info. Thanks
> >
> > - ds
> >
> >
> > ---------------------------------
> > Yahoo! oneSearch: Finally, mobile search that gives answers, not web
> > links.
> >
> >
> > ---------------------------------
> > Ready for the edge of your seat? Check out tonight's top picks on  
> Yahoo!
> > TV.
>


Mime
View raw message