hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dongsheng Wang <phid...@yahoo.com>
Subject Re: Use HDFS as a long term storage solution?
Date Thu, 06 Sep 2007 02:04:17 GMT
 2) loading lots of small files into HDFS, causing it to hang on a Map/Reduce
job and subsequently display corruption on restart; 

I had problem too when we load thousands files into hadoop and run MapReduce job. I thought
it is just a performance problem. And, It apparently Hadoop start a new process for each task.
I believe that might be the reason.

Jeff Hammerbacher <jeff.hammerbacher@gmail.com> wrote: We have very similar plans for
Hadoop to what C G quotes below, but we've
found the stability of HDFS to be quite troublesome.  We've corrupted HDFS
three different ways in a few weeks: 1) running jStack on the Namenode; 2)
loading lots of small files into HDFS, causing it to hang on a Map/Reduce
job and subsequently display corruption on restart; 3) upgrading to a newer
version of Hadoop.  Thus we are very uncertain about treating HDFS as a
reliable long-term data store.

That being said, we're excited about the opportunities created by Hadoop so
we're going to put some time into making it more reliable and creating a
utility to archive data out of HDFS for backup purposes.

On 9/5/07, C G 
 wrote:
>
> Our intention is to use HDFS as the core of a large "data repository".  We
> store "raw" data within HDFS on a more-or-less permanent basis, and
> map/reduce it to produce load files for our data warehouse.  We have other
> plans as well all centered around storing data on a very long term basis in
> HDFS.  So you're in good company...
>
>   Our plan is for a 64T HDFS repository, with a replication factor of 3
> for a ~21T data space.
>
>   C G
>
>
> Dongsheng Wang 
 wrote:
>
> We are looking at using HDFS as a long term storage solution. We want to
> use it to stored lots of files. The file could be big and small, they are
> images, videos etc... We only write the files once, and may read them many
> times. Sounds like it is perfect to use HDFS.
>
> The concern is that since it's been engineered to support MapReduce there
> may be fundamental assumptions that the data being stored by HDFS is
> transient in nature. Obviously for our scalable storage solution zero data
> loss or corruption is a heavy requirement.
>
> Is anybody using HDFS as a long term storage solution? Interested in any
> info. Thanks
>
> - ds
>
>
> ---------------------------------
> Yahoo! oneSearch: Finally, mobile search that gives answers, not web
> links.
>
>
> ---------------------------------
> Ready for the edge of your seat? Check out tonight's top picks on Yahoo!
> TV.




       
---------------------------------
Building a website is a piece of cake. 
Yahoo! Small Business gives you all the tools to get online.
Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message