hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmed El Zein <ahmed.elz...@anu.edu.au>
Subject Re: Using HDFS just for storage
Date Thu, 25 Feb 2010 01:36:08 GMT
I am currently scoping out HDSF for a 3PB (raw) storage array using
the cloudera CDH2 distribution. While the main objective is to store
scientific datasets, I can't wait to experiment with the added bonus
of having hadoop do data analysis on the datasets.

We have to be able to support large data transfer over (possibly
multiple) 10GigE links so we are looking at GridFTP for data

I have just started so I don't have any conclusions yet but I believe
that Caltec and other universities are using hadoop or HDFS for


On Thu, Feb 25, 2010 at 11:50 AM, Greg Connor <gconnor@createspace.com> wrote:
> Hello HDFS users,
> We are considering using Hadoop just as a clustered storage solution,
> and I'm wondering if anyone has used it like this, and might have some
> experiences or wisdom to share?
> We need to distribute lots of large files over 30+ machines, and HDFS
> seems to have all the right features, including replication, reacting
> automatically to downed nodes, etc.  From a features point of view, it
> seems to be a good fit, but I really want to know if this is backed up
> by any real-world experience.
> First concern I have: Some of our initial throughput tests show that
> transferring files into and out of HDFS is noticeably slower than just a
> straight copy to the machine would be... I was hoping the throughput
> would be the same, or better in cases where my hadoop client machine can
> talk to many datanodes at once.  Is this lower copy throughput expected,
> or is there perhaps something I've failed to tune?
> My other concern would be, what would happen if we set the default
> replication to 2... I know 3 is the customary setting but we really need
> to keep the costs down.  Does anyone have real-world experience with
> maintaining a medium-sized farm with replication set to 2?  Anything to
> watch out for?
> Thanks for any feedback.  You can write me directly and I'll be happy to
> summarize findings back to the list if there is interest.
> gregc

View raw message