hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: Looking for some expert insights on HDFS
Date Sun, 21 Oct 2007 17:26:04 GMT

On 10/21/07 4:13 AM, "Yoav Steinberg" <yoav@monfort.co.il> wrote:

> - How do we handle failure of the single metaserver/namenode? Is there a
> way to build a no "downtime" solution?

HDFS is currently designed in such a way that you should be able to achieve
low downtime.  It is not intended for zero downtime operation (yet).

> - What are the major differences between KFS and HDFS? - spec wise they
> seem similar.

I don't know much about KFS, but from superficial examination, KFS is a bit
more like a real file system (you can re-write files) but is written in C++
by a separate team from the Hadoop developers.  HDFS is written in Java and
is maintained by the Hadoop team which leads to tighter tracking.

I imagine that the KFS developers would know much more about this issue.

> - Our service needs to handle a large amount of small (typically 1-20MB)
> in size files. Is HDFS/KFS appropriate for this?

Define "large amount".  If you mean < 10-20 x 10^6, HDFS should be fine.  If
you mean 100 x 10^6, you will probably have problems.
> - Our service requires accessing these files in a low latency fashion,

HDFS does this pretty well.  You may, at some point, overload the namenode
with file open traffic since it is involved in telling you where files.
Hadoop is, as you have noted, optimized for reading large files.  That means
that the design assumption is that readers will spend much more time reading
than opening.  For your application this assumption is probably a bit wrong.

> - Are there any options for providing data reliability without the need
> for complete replication (waisting storage space). For example
> performing "raid xor" type operations between chunks/blocks?

Replication is not as evil as you might think because it allows you to
operate with very low cost storage devices and gives you increased read
speeds while you are at it.

You can easily achieve < $1 / GB of raw storage even at small volumes (I
just bought a single rackmount Dell for home use that achieved this).  For
reasonably large installations (>8T), you can get near 0.5 $/GB without
pushing matters that hard.  This compares to several $/GB for any serious
traditional storage solution such as NetApp, especially since you need to
deal with replication for traditional solutions if you require 100% uptime.

My practice grid at work has a dozen machines in it (more or less) that were
cast-offs from normal production.  Several of these machines "survived" a
cooling system failure in the colocation facility and are distinctly flaky.
Moreover, they are low priority machines subject to re-racking if higher
priority tasks turn up.  In several weeks of use, running with a replication
level of 2, this batch of losers has not lost any data at all even though 3
machines have failed, 2 have been taken down without notice to add disk and
4 had to be relocated with little notice.  This experience has really sold
me on the superiority of distributed file systems based on simple

The net is the replication is a virtue, not really a problem.

> - Are there any other DFS's you'd recommend looking into, which might
> better fit our requirements?

Look at MogileFS as well.  It is designed for backing up large scale web
properties.  The namenode is kept in a database which makes read-only
fallback operation relatively simple.  It is also intended to support
scaling to larger numbers of smaller objects than HDFS.

View raw message