Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of paulgnyc@gmail.com designates
 209.85.198.239 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=message-id:date:from:to:subject:in-reply-to:mime-version
         :content-type:references;
        b=n2X231ovwWBA1s+tTVjTA3/YnfY0DhjHnafucJiSdclWYAZySe6hscTg0GQ91Xnij+
         Du3ZdPE50/flHGVrHTl1/YxoeTAm5LnqpJavU9xC56hvO68OWS9Bgmr8exrbnLHFxgWX
         zk4VkwcopDSIAY4umEP4Pustl021lKoI8BySc=
Message-ID: <dd93292a0808011019p35c5c1d9sba6b5f5240fc84d1@mail.gmail.com>
Date: Fri, 1 Aug 2008 13:19:01 -0400
From: paul <paulgnyc@gmail.com>
To: core-user@hadoop.apache.org
Subject: Re: Multiple master nodes
In-Reply-To: <563393.20487.qm@web50307.mail.re2.yahoo.com>
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_Part_8997_31335635.1217611141374"
References: <563393.20487.qm@web50307.mail.re2.yahoo.com>

------=_Part_8997_31335635.1217611141374
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

Otis,

The DRBD setup is relatively straight forward now and the documentation is
pretty thorough at http://www.drbd.org/users-guide/.  I only run a two node
setup for the masters, so a one to many replication scheme is outside of my
requirements.  I'm currently running my cluster on CentOS 5 and there are
rpms available for DRBD through the extras repository with the following
packages:

drbd82.x86_64
kmod-drbd82.x86_64


There's nothing Hadoop specific, other than starting up the right services
in the right order when using heartbeat.  (The secondary server does not run
it's namenode processes while it's in standby mode)  This is no different
than many other apps in an HA scenario so it's hard to even call this Hadoop
specific.

As far as being happy with it, yes, so far I am.  I've had enough history of
usage with DRBD over the past four years that I'm pretty comfortable with
it's reliability and performance.  I've also done replication of data sets
much larger than the namenode's with negligible performance overhead (after
the initial sync).  Your mileage may vary based on the change rate of your
namenode's data, but for our purposes there is little to no concern.


Here's a few more details on my current configuration...

I do not use a crossover cable between the nodes as you'll often see
recommended by the documentation and howto's.  Instead, since my servers
each have two NICs, I use bonding with LACP and use the bond0 device for
both my regular traffic and my drbd replication.  With this setup, I'd have
to lose two NICs (and two switches on my network) in order to have a
complete network failure and risk any split brain.


My /etc/drbd.conf is pretty simple:

#
# drbd.conf example
#

global { usage-count no; }

resource r0 {

  protocol      C;

  syncer { rate 110M; }

  startup { wfc-timeout 0; degr-wfc-timeout     120; }

  on grid102.domain.prod {
    device /dev/drbd0;
    disk /dev/sda4;
    address     10.6.5.62:7788;
    meta-disk internal;
  }

  on grid101.domain.prod {
    device /dev/drbd0;
    disk /dev/sda4;
    address     10.6.5.61:7788;
    meta-disk internal;
  }
}

#
# end drbd.conf
#


And a single entry in /etc/fstab:

/dev/drbd0 /hadoop                    ext3    defaults,noauto        0 0


Obviously there's more to creating the device and file system, but there are
pretty clear instructions on this through the user guide.  I do most of it
through some scripts that I keep around for building cluster masters and
nodes in my environment which the following lines come from:


### start script ###

SOURCE_DIR=/mnt/hadoop/dist

mkdir -p /hadoop
echo "/dev/drbd0 /hadoop                    ext3    noauto        1 2" >>
/etc/fstab

yum -y install drbd82 kmod-drbd82

/bin/cp $SOURCE_DIR/drbd.conf /etc
chkconfig drbd on

yes | drbdadm create-md r0

service drbd start

# run only on primary, manually
# drbdadm -- --overwrite-data-of-peer primary r0

### end script ###

(fdisk of the volume and mkfs need to be added in there at the end)


If you have any more questions on the setup let me know and I'll try to
answer them for you.


-paul


On Fri, Aug 1, 2008 at 10:09 AM, Otis Gospodnetic <
otis_gospodnetic@yahoo.com> wrote:

> I've been wondering about DRBD.  Many (5+?) years ago when I looked at DRBD
> it required too much low-level tinkering and required hardware I did not
> have.  I wonder what it takes to set it up now and if there are any
> Hadoop-specific things you needed to do?  Overall, are you happy with DRBD?
> (you are limited to 2 nodes, right?)
>
>
> Thanks,
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
> > From: paul <paulgnyc@gmail.com>
> > To: core-user@hadoop.apache.org
> > Sent: Tuesday, July 29, 2008 2:56:44 PM
> > Subject: Re: Multiple master nodes
> >
> > I'm currently running with your option B setup and it seems to be
> reliable
> > for me (so far).  I use a combination of drbd and various
> hearbeat/LinuxHA
> > scripts that handle the failover process, including a virtual IP for the
> > namenode.  I haven't had any real-world unexpected failures to deal with,
> > yet, but all manual testing has had consistent and reliable results.
> >
> >
> >
> > -paul
> >
> >
> > On Tue, Jul 29, 2008 at 1:54 PM, Ryan Shih wrote:
> >
> > > Dear Hadoop Community --
> > >
> > > I am wondering if it is already possible or in the plans to add
> capability
> > > for multiple master nodes. I'm in a situation where I have a master
> node
> > > that may potentially be in a less than ideal execution and networking
> > > environment. For this reason, it's possible that the master node could
> die
> > > at any time. On the other hand, the application must always be
> available. I
> > > have accessible to me other machines but I'm still unclear on the best
> > > method to add reliability.
> > >
> > > Here are a few options that I'm exploring:
> > > a) To create a completely secondary Hadoop cluster that we can flip to
> when
> > > we detect that the master node has died. This will double hardware
> costs,
> > > so
> > > if we originally have a 5 node cluster, then we would need to pull 5
> more
> > > machines out of somewhere for this decision. This is not the preferable
> > > choice.
> > > b) Just mirror the master node via other always available software,
> such as
> > > DRBD for real time synchronization. Upon detection we could swap to the
> > > alternate node.
> > > c) Or if Hadoop had some functionality already in place, it would be
> > > fantastic to be able to take advantage of that. I don't know if
> anything
> > > like this is available but I could not find anything as of yet. It
> seems to
> > > me, however, that having multiple master nodes would be the direction
> > > Hadoop
> > > needs to go if it is to be useful in high availability applications. I
> was
> > > told there are some papers on Amazon's Elastic Computing that I'm about
> to
> > > look for that follow this approach.
> > >
> > > In any case, could someone with experience in solving this type of
> problem
> > > share how they approached this issue?
> > >
> > > Thanks!
> > >
>
>

------=_Part_8997_31335635.1217611141374--