hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From C G <parallel...@yahoo.com>
Subject Re: NameNode HA
Date Sat, 24 Nov 2007 04:06:51 GMT
We are rolling out a production grid with 32 compute nodes.  The current plan is to try to
avoid catastrophic namenode failures by:
  1.  Running DRBD and mirror to another machine
  2.  Use namenode multiple volumes to replicate the name space image and edit logs to yet
another machine (see http://wiki.apache.org/lucene-hadoop/FAQ#15).
  I'm also considering using filesystem snapshotting as well.  The above solutions presume
that the mode of failure is hardware rather than software.  A regular snapshot would be useful
is something bad happened within the Hadoop framework itself and something scribbled all over
the namenode's data.
  As we get DRBD deployed and get to production I'll post more about our experiences.
  C G

Erich Nachbar <erich@carrieriq.com> wrote:
  Did anyone try DRBD (http://www.drbd.org/) for mirroring the fsimage 
and editlogs to another machine?

Another idea which would involve code changes is to go to something 
like Terracotta (http://www.terracottatech.com/) essentially allowing 
multiple machines simultaneously to play the role of a namenode. I 
only played around with their samples, but if it works as advertised 
it could be a nice way to spread the load and achieve HA.

Disclaimer: Not affiliated with DRDB or Terracotta. Just in need of an 
(ideally automatic) failover solution to protect my weekends.

On Nov 21, 2007, at 6:51 AM, j2eeiscool wrote:

> Hi Dbruba,
> Thanx for your reply.
> On the first part (NameNode HA and failover), our experience with 
> NFS has
> not been very good.
> Is having a Db as a backing store for NameNode an option (I 
> understand that
> this may not be part of the current release 0.15.0 and would be a new
> feature)?
> -Taj
> Dhruba Borthakur wrote:
>> Here is some info on recovering from a failed Namenode:
>> http://wiki.apache.org/lucene-hadoop/NameNodeFailover
>> The fact that there is a single Namenode does mean that it could
>> possibly become the bottleneck when many thousands of clients/ 
>> Datanodes
>> run on the cluster simultaneously. However, the design is such that 
>> it
>> is scalable to a huge number of clients/Datanodes. Also, work is 
>> going
>> on continuously to improve scalabilty.
>> Thanks,
>> Dhruba
>> -----Original Message-----
>> From: j2eeiscool [mailto:siddiqut@yahoo.com]
>> Sent: Tuesday, November 20, 2007 12:47 PM
>> To: hadoop-user@lucene.apache.org
>> Subject: NameNode HA
>> Hi,
>> Based on the documentation I have read, there is one instance of a
>> NameNode.
>> Are there recommended approaches on making the NameNode HA:
>> 1.Have a backup which takes over. Data between primary and backup is
>> shared
>> thru shared files , DB etc.
>> Also does having a single NameNode limit the no. of concurrent HDFS
>> clients
>> ? I understand that HDFS Readers and Writers use the DataNode(s)
>> eventually,
>> but the initial access point is the NameNode.
>> I would really appreciate help on these (I am evaluating HDFS for 
>> use as
>> a
>> Concurrent, Reliable, Performant Distributed File System).
>> Thanx,
>> Taj
>> -- 
>> View this message in context:
>> http://www.nabble.com/NameNode-HA-tf4846281.html#a13865411
>> Sent from the Hadoop Users mailing list archive at Nabble.com.
> -- 
> View this message in context: http://www.nabble.com/NameNode-HA-tf4846281.html#a13878663
> Sent from the Hadoop Users mailing list archive at Nabble.com.

Be a better pen pal. Text or chat with friends inside Yahoo! Mail. See how.
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message