Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 55339 invoked from network); 1 Aug 2008 17:19:37 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 1 Aug 2008 17:19:37 -0000 Received: (qmail 27537 invoked by uid 500); 1 Aug 2008 17:19:32 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 27498 invoked by uid 500); 1 Aug 2008 17:19:32 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 27487 invoked by uid 99); 1 Aug 2008 17:19:32 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Aug 2008 10:19:32 -0700 X-ASF-Spam-Status: No, hits=3.5 required=10.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of paulgnyc@gmail.com designates 209.85.198.239 as permitted sender) Received: from [209.85.198.239] (HELO rv-out-0506.google.com) (209.85.198.239) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Aug 2008 17:18:34 +0000 Received: by rv-out-0506.google.com with SMTP id k40so1932384rvb.29 for ; Fri, 01 Aug 2008 10:19:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type:references; bh=lQcGjvG9C/Ngxsie+G/u+7zrhCoCYIitvyqgxZ0RMm0=; b=fGFWqFlzlPZeMgpK67ioET25qGJ1ZwAEqYadjHdbW+I7ky5qirprrPmKvMJ2dlrdbi HuZze9I2vrvpH4jmLMk5dA3Kb3o7kaD4i+KzHYCLXBRAFe6uVwjAsIJdqzksmBfiu1hs c1J6OhKezJBNWY5sB/z8+9OCZtYy1aL8flXf0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:references; b=n2X231ovwWBA1s+tTVjTA3/YnfY0DhjHnafucJiSdclWYAZySe6hscTg0GQ91Xnij+ Du3ZdPE50/flHGVrHTl1/YxoeTAm5LnqpJavU9xC56hvO68OWS9Bgmr8exrbnLHFxgWX zk4VkwcopDSIAY4umEP4Pustl021lKoI8BySc= Received: by 10.140.201.1 with SMTP id y1mr6069640rvf.246.1217611141364; Fri, 01 Aug 2008 10:19:01 -0700 (PDT) Received: by 10.141.137.10 with HTTP; Fri, 1 Aug 2008 10:19:01 -0700 (PDT) Message-ID: Date: Fri, 1 Aug 2008 13:19:01 -0400 From: paul To: core-user@hadoop.apache.org Subject: Re: Multiple master nodes In-Reply-To: <563393.20487.qm@web50307.mail.re2.yahoo.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_8997_31335635.1217611141374" References: <563393.20487.qm@web50307.mail.re2.yahoo.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_8997_31335635.1217611141374 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Otis, The DRBD setup is relatively straight forward now and the documentation is pretty thorough at http://www.drbd.org/users-guide/. I only run a two node setup for the masters, so a one to many replication scheme is outside of my requirements. I'm currently running my cluster on CentOS 5 and there are rpms available for DRBD through the extras repository with the following packages: drbd82.x86_64 kmod-drbd82.x86_64 There's nothing Hadoop specific, other than starting up the right services in the right order when using heartbeat. (The secondary server does not run it's namenode processes while it's in standby mode) This is no different than many other apps in an HA scenario so it's hard to even call this Hadoop specific. As far as being happy with it, yes, so far I am. I've had enough history of usage with DRBD over the past four years that I'm pretty comfortable with it's reliability and performance. I've also done replication of data sets much larger than the namenode's with negligible performance overhead (after the initial sync). Your mileage may vary based on the change rate of your namenode's data, but for our purposes there is little to no concern. Here's a few more details on my current configuration... I do not use a crossover cable between the nodes as you'll often see recommended by the documentation and howto's. Instead, since my servers each have two NICs, I use bonding with LACP and use the bond0 device for both my regular traffic and my drbd replication. With this setup, I'd have to lose two NICs (and two switches on my network) in order to have a complete network failure and risk any split brain. My /etc/drbd.conf is pretty simple: # # drbd.conf example # global { usage-count no; } resource r0 { protocol C; syncer { rate 110M; } startup { wfc-timeout 0; degr-wfc-timeout 120; } on grid102.domain.prod { device /dev/drbd0; disk /dev/sda4; address 10.6.5.62:7788; meta-disk internal; } on grid101.domain.prod { device /dev/drbd0; disk /dev/sda4; address 10.6.5.61:7788; meta-disk internal; } } # # end drbd.conf # And a single entry in /etc/fstab: /dev/drbd0 /hadoop ext3 defaults,noauto 0 0 Obviously there's more to creating the device and file system, but there are pretty clear instructions on this through the user guide. I do most of it through some scripts that I keep around for building cluster masters and nodes in my environment which the following lines come from: ### start script ### SOURCE_DIR=/mnt/hadoop/dist mkdir -p /hadoop echo "/dev/drbd0 /hadoop ext3 noauto 1 2" >> /etc/fstab yum -y install drbd82 kmod-drbd82 /bin/cp $SOURCE_DIR/drbd.conf /etc chkconfig drbd on yes | drbdadm create-md r0 service drbd start # run only on primary, manually # drbdadm -- --overwrite-data-of-peer primary r0 ### end script ### (fdisk of the volume and mkfs need to be added in there at the end) If you have any more questions on the setup let me know and I'll try to answer them for you. -paul On Fri, Aug 1, 2008 at 10:09 AM, Otis Gospodnetic < otis_gospodnetic@yahoo.com> wrote: > I've been wondering about DRBD. Many (5+?) years ago when I looked at DRBD > it required too much low-level tinkering and required hardware I did not > have. I wonder what it takes to set it up now and if there are any > Hadoop-specific things you needed to do? Overall, are you happy with DRBD? > (you are limited to 2 nodes, right?) > > > Thanks, > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > ----- Original Message ---- > > From: paul > > To: core-user@hadoop.apache.org > > Sent: Tuesday, July 29, 2008 2:56:44 PM > > Subject: Re: Multiple master nodes > > > > I'm currently running with your option B setup and it seems to be > reliable > > for me (so far). I use a combination of drbd and various > hearbeat/LinuxHA > > scripts that handle the failover process, including a virtual IP for the > > namenode. I haven't had any real-world unexpected failures to deal with, > > yet, but all manual testing has had consistent and reliable results. > > > > > > > > -paul > > > > > > On Tue, Jul 29, 2008 at 1:54 PM, Ryan Shih wrote: > > > > > Dear Hadoop Community -- > > > > > > I am wondering if it is already possible or in the plans to add > capability > > > for multiple master nodes. I'm in a situation where I have a master > node > > > that may potentially be in a less than ideal execution and networking > > > environment. For this reason, it's possible that the master node could > die > > > at any time. On the other hand, the application must always be > available. I > > > have accessible to me other machines but I'm still unclear on the best > > > method to add reliability. > > > > > > Here are a few options that I'm exploring: > > > a) To create a completely secondary Hadoop cluster that we can flip to > when > > > we detect that the master node has died. This will double hardware > costs, > > > so > > > if we originally have a 5 node cluster, then we would need to pull 5 > more > > > machines out of somewhere for this decision. This is not the preferable > > > choice. > > > b) Just mirror the master node via other always available software, > such as > > > DRBD for real time synchronization. Upon detection we could swap to the > > > alternate node. > > > c) Or if Hadoop had some functionality already in place, it would be > > > fantastic to be able to take advantage of that. I don't know if > anything > > > like this is available but I could not find anything as of yet. It > seems to > > > me, however, that having multiple master nodes would be the direction > > > Hadoop > > > needs to go if it is to be useful in high availability applications. I > was > > > told there are some papers on Amazon's Elastic Computing that I'm about > to > > > look for that follow this approach. > > > > > > In any case, could someone with experience in solving this type of > problem > > > share how they approached this issue? > > > > > > Thanks! > > > > > ------=_Part_8997_31335635.1217611141374--