From "Joydeep Sen Sarma" <jssa...@facebook.com>
Subject RE: Multiple HDFS paths and Multiple Masters...
Date Thu, 13 Sep 2007 17:55:21 GMT
I used to work in Netapp HA group - so can explain the single drive
failure stuff a bit (although the right forum is the
toasters@mathworks.com mailing list).

The shelves are supposed to bypass failed drives (the shelf re-routes
the FC loop when it detects failed drives). However, there were rare
drive failure modes where the drive would malfunction - but not in a way
detectable by the shelf - leading to the entire FC-loop malfunctioning -
and leading to multi-disk failure. The disks are dual attached - but in
this failure mode - they would take out both loops.

That said - this is circa 2004/5. New generation shelves fixed the
problems, as well as Netapp was asking shelf vendors for software
interface to power-cycle failed drives (so that netapp software could
take out bad drives by hard power-reset instead of relying on shelf
firmware). I don't know the current status.

In general - one of the big value adds of using Netapp (or EMC for that
matter) is their extensive understanding of drive/shelf failure modes
and the ability to proactively predict and take safeguard actions
against such failures. 

Regarding RAID mirroring - well - it actually protects against cases
like this (since Netapp always puts mirrored copies on different
shelves/loops - thereby protecting against shelf/loop failure). But
RAID-4/5 (or netapp dual-parity) with backups and/or replication is a
good alternative (with somewhat lower availability guarantees and

Hope this helps ..

-----Original Message-----
From: C G [mailto:parallelguy@yahoo.com] 
Sent: Thursday, September 13, 2007 9:47 AM
To: hadoop-user@lucene.apache.org
Subject: Re: Multiple HDFS paths and Multiple Masters...

Allen, Ted:
  Good stuff...thanks for the information.  Ted, a bit off-topic but
your comment about netapp single drive failures gave me pause,
particularly since we have a large one deployed now.  Would you mind
saying more on that...feel free to contact me direct since it is
  C G

Ted Dunning <tdunning@veoh.com> wrote:

On 9/13/07 6:00 AM, "C G" 

> I'd like to run nodes with around 2T of local disk set up as JBOD. So
> would have 4 separate file systems per machine, for example /hdfs_a,
> /hdfs_c, /hdfs_d . Is it possible to configure things so that HDFS
> about all 4 file systems?

Yes. This is normally done to allow heterogeneity in data/task nodes.
make a list of all of the file systems that MIGHT be available and
figures out which are available and which have space to use.

> Since we're using HDFS replication I see no point in
> using RAID-anything...to me that's the whole point of replication

That is the intent!

> Is it possible to set things up in Hadoop to run multiple masters?

Not yet. Doug makes very good points on this topic that a single master
will be fairly reliable and that it is the cluster that will have common
failures and thus must be robust to node failure.

There are lots of HA options. One that looks very nice to me (but that I
haven't tried) is DRDB which is a block level disk replication service.
http://www.drbd.org/ for more information (and let us know how it

The secondary nameserver may be of some help in recovery as well, but it
unlikely to be as quick as a replicated disk and a CARP based IP

> If you can't run multiple namenodes, then that sort of implies the
> which is hosting *the* namenode needs to do all the traditional things
> protect against data loss/corruption, including frequent backups, RAID
> mirroring, etc. 

Some of these things are happening already, but the others are not a bad
idea at all. Consider your hardware carefully. RAID mirroring can
*decrease* reliability if you get a failure from either drive. Happened
me on my home machine and I have heard of other cases as well. Even in
sophisticated implementations such as are done by Netapp, you can have
failures that freeze an entire shelf. My preference any more is
simple machines rather than fancy machines.

