hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Multiple master nodes
Date Fri, 01 Aug 2008 14:09:05 GMT
I've been wondering about DRBD.  Many (5+?) years ago when I looked at DRBD it required too
much low-level tinkering and required hardware I did not have.  I wonder what it takes to
set it up now and if there are any Hadoop-specific things you needed to do?  Overall, are
you happy with DRBD? (you are limited to 2 nodes, right?)


Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: paul <paulgnyc@gmail.com>
> To: core-user@hadoop.apache.org
> Sent: Tuesday, July 29, 2008 2:56:44 PM
> Subject: Re: Multiple master nodes
> 
> I'm currently running with your option B setup and it seems to be reliable
> for me (so far).  I use a combination of drbd and various hearbeat/LinuxHA
> scripts that handle the failover process, including a virtual IP for the
> namenode.  I haven't had any real-world unexpected failures to deal with,
> yet, but all manual testing has had consistent and reliable results.
> 
> 
> 
> -paul
> 
> 
> On Tue, Jul 29, 2008 at 1:54 PM, Ryan Shih wrote:
> 
> > Dear Hadoop Community --
> >
> > I am wondering if it is already possible or in the plans to add capability
> > for multiple master nodes. I'm in a situation where I have a master node
> > that may potentially be in a less than ideal execution and networking
> > environment. For this reason, it's possible that the master node could die
> > at any time. On the other hand, the application must always be available. I
> > have accessible to me other machines but I'm still unclear on the best
> > method to add reliability.
> >
> > Here are a few options that I'm exploring:
> > a) To create a completely secondary Hadoop cluster that we can flip to when
> > we detect that the master node has died. This will double hardware costs,
> > so
> > if we originally have a 5 node cluster, then we would need to pull 5 more
> > machines out of somewhere for this decision. This is not the preferable
> > choice.
> > b) Just mirror the master node via other always available software, such as
> > DRBD for real time synchronization. Upon detection we could swap to the
> > alternate node.
> > c) Or if Hadoop had some functionality already in place, it would be
> > fantastic to be able to take advantage of that. I don't know if anything
> > like this is available but I could not find anything as of yet. It seems to
> > me, however, that having multiple master nodes would be the direction
> > Hadoop
> > needs to go if it is to be useful in high availability applications. I was
> > told there are some papers on Amazon's Elastic Computing that I'm about to
> > look for that follow this approach.
> >
> > In any case, could someone with experience in solving this type of problem
> > share how they approached this issue?
> >
> > Thanks!
> >


Mime
View raw message