Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3378EE60F for ; Tue, 15 Jan 2013 04:36:41 +0000 (UTC) Received: (qmail 92327 invoked by uid 500); 15 Jan 2013 04:36:36 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 91939 invoked by uid 500); 15 Jan 2013 04:36:35 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 91890 invoked by uid 99); 15 Jan 2013 04:36:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Jan 2013 04:36:33 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of anilgupta84@gmail.com designates 209.85.215.182 as permitted sender) Received: from [209.85.215.182] (HELO mail-ea0-f182.google.com) (209.85.215.182) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Jan 2013 04:36:27 +0000 Received: by mail-ea0-f182.google.com with SMTP id a12so215256eaa.41 for ; Mon, 14 Jan 2013 20:36:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=mux4d83q5VzWJ3XkgT6djdwfgsTdRh+sk5zz3rSj+l8=; b=hwUmCDcyrJAs4Xz/6eNCCRnAJD3jTYs3RKj8uk9UbOcWVGuIHI47L7xD10D/l3RAhK FejquFVpM4X7A1xRzxocfJ7p0AB7O/dVbozobNkTwkxGWbDl8sRzuNLepOdGRRgch83b UJ8Zh8BPeJH1YFU6gAFaGS65ZL5BkoQ80I0cjX+zIepyh0kFTGJDKTFM6osPgKiwi6uY xO8mlQ6O4pdaOe5vbka1ViGF3GiOBDqLa0wmQH32HTMiNbpwRnM7czR+qRH1TaLw6cnC a2q79NkhaOQgbSoBclYKj6u8XJYFa4DLq5Lc7v+82YpNgQ1TToUOVyjzuyP/1dMrLo5W ivxw== Received: by 10.14.207.195 with SMTP id n43mr236128575eeo.38.1358224565692; Mon, 14 Jan 2013 20:36:05 -0800 (PST) MIME-Version: 1.0 Received: by 10.223.159.135 with HTTP; Mon, 14 Jan 2013 20:35:45 -0800 (PST) In-Reply-To: References: <746470510-1358218243-cardhu_decombobulator_blackberry.rim.net-1149782625-@b1.c16.bise7.blackberry> <405696089-1358219515-cardhu_decombobulator_blackberry.rim.net-1511197333-@b1.c16.bise7.blackberry> From: anil gupta Date: Mon, 14 Jan 2013 20:35:45 -0800 Message-ID: Subject: Re: hadoop namenode recovery To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=047d7b343732c367ce04d34c4c89 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b343732c367ce04d34c4c89 Content-Type: text/plain; charset=ISO-8859-1 Inline On Mon, Jan 14, 2013 at 7:48 PM, Panshul Whisper wrote: > Hello, > I have another idea.... regarding solving the single point failure of > Hadoop... > What If I have multiple Name Nodes setup and running behind a load > balancer in the cluster. So this way I can have multiple Name Nodes at the > same IP Address of the load balancer. Which resolves the problem of > failure, If one Name Node goes down, others are working. > This wont work since the DataNode's always needs to be aware of active NameNode to send heartbeat and for other communication. Hortonworks as well as Cloudera both have solution for Single Point of Failure of Namenode. You will have to analyze the solutions and pick one. > > Please suggest.... this is just a vague idea..!! > > Thanx > > > On Mon, Jan 14, 2013 at 7:31 PM, Panshul Whisper wrote: > >> Hello Bejoy, >> >> Thank you for the information. >> about the Hadoop HA 2.x releases, they are in Alpha phase and I cannot >> use them for production. For my requirements, the cluster is supposed to be >> extremely Available. Availability is of highest concern. I have looked into >> different distributions as well.. such as Hortonworks, they also have the >> same problem of Single point of failure. And are waiting for Apache to >> release the Hadoop 2.x. >> >> I was wondering, if I can somehow configure two Name Nodes on the same >> Network with the same IP Address, but the second name node is redirected >> only after the failure of the primary, that might help in automatic >> resolution of this problem. all the slaves are connecting to the Name Node >> with a network alias in their /etc/hosts file. >> I am trying to implement something like this in the cluster: >> http://networksandservers.blogspot.de/2011/04/failover-clustering-i.html >> >> please suggest if this is possible. >> >> Thanks for your time. >> Regards, >> Panshul. >> >> >> On Mon, Jan 14, 2013 at 7:11 PM, wrote: >> >>> ** >>> Hi Panshul >>> >>> SecondaryNameNode is rather known as check point node. At periodic >>> intervals it merges the editlog from NN with FS image to prevent the edit >>> log from growing too large. This is its main functionality. >>> >>> At any point the SNN would have the latest fs image but not the updated >>> edit log. If NN goes down and if you don't have an updated copy of edit log >>> you can use the fsImage from SNN for restoring. In that case you lose the >>> transactions in edit log. >>> >>> SNN is not a backup NN it is just a check point node. >>> >>> Two or more NN are not possible in 1.x releases but federation makes it >>> possible with 2.x releases. Federation is for different purpose, you should >>> be looking at hadoop HA currently with 2.x releases. >>> Regards >>> Bejoy KS >>> >>> Sent from remote device, Please excuse typos >>> ------------------------------ >>> *From: * Panshul Whisper >>> *Date: *Mon, 14 Jan 2013 19:04:24 -0800 >>> *To: *; >>> *Subject: *Re: hadoop namenode recovery >>> >>> thank you for the reply. >>> >>> Is there a way with which I can configure my cluster to switch to the >>> Secondary Name Node automatically in case of the Primary Name Node failure? >>> When I run my current Hadoop, I see the primary and secondary both Name >>> nodes running. I was wondering what is that Secondary Name Node for? and >>> where is it configured? >>> I was also wondering, is it possible to have two or more Name nodes >>> running in the same cluster? >>> >>> Thanks, >>> Regards, >>> Panshul. >>> >>> >>> On Mon, Jan 14, 2013 at 6:50 PM, wrote: >>> >>>> ** >>>> Hi Panshul, >>>> >>>> Usually for reliability there will be multiple dfs.name.dir configured. >>>> Of which one would be a remote location such as a nfs mount. >>>> So that even if the NN machine crashes on a whole you still have the fs >>>> image and edit log in nfs mount. This can be utilized for reconstructing >>>> the NN back again. >>>> >>>> >>>> Regards >>>> Bejoy KS >>>> >>>> Sent from remote device, Please excuse typos >>>> ------------------------------ >>>> *From: * Panshul Whisper >>>> *Date: *Mon, 14 Jan 2013 17:25:08 -0800 >>>> *To: * >>>> *ReplyTo: * user@hadoop.apache.org >>>> *Subject: *hadoop namenode recovery >>>> >>>> Hello, >>>> >>>> Is there a standard way to prevent the failure of Namenode crash in a >>>> Hadoop cluster? >>>> or what is the standard or best practice for overcoming the Single >>>> point failure problem of Hadoop. >>>> >>>> I am not ready to take chances on a production server with Hadoop 2.0 >>>> Alpha release, which claims to have solved the problem. Are there any other >>>> things I can do to either prevent the failure or recover from the failure >>>> in a very short time. >>>> >>>> Thanking You, >>>> >>>> -- >>>> Regards, >>>> Ouch Whisper >>>> 010101010101 >>>> >>> >>> >>> >>> -- >>> Regards, >>> Ouch Whisper >>> 010101010101 >>> >> >> >> >> -- >> Regards, >> Ouch Whisper >> 010101010101 >> > > > > -- > Regards, > Ouch Whisper > 010101010101 > -- Thanks & Regards, Anil Gupta --047d7b343732c367ce04d34c4c89 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Inline

On Mon, Jan 14, 2013 at 7:48 PM, P= anshul Whisper <ouchwhisper@gmail.com> wrote:
Hello,
I have another idea.... regarding solving the s= ingle point failure of Hadoop...
What If I have multiple Name Nod= es setup and running behind a load balancer in the cluster. So this way I c= an have multiple Name Nodes at the same IP Address of the load balancer. Wh= ich resolves the problem of failure, If one Name Node goes down, others are= working.
This wont work since the DataNode's always need= s to be aware of active NameNode to send heartbeat and for other communicat= ion.
Hortonworks as well as Cloudera both have solution for Single Point= of Failure of Namenode. You will have to analyze the solutions and pick on= e.

Please suggest.... this is just a vague idea..!!
<= div>
Thanx


On Mon, Jan = 14, 2013 at 7:31 PM, Panshul Whisper <ouchwhisper@gmail.com> wrote:
Hello Bejoy,

=
Thank you for the information.
about the Hadoop HA 2.x relea= ses, they are in Alpha phase and I cannot use them for production. For my r= equirements, the cluster is supposed to be extremely Available. Availabilit= y is of highest concern. I have looked into different distributions as well= .. such as Hortonworks, they also have the same problem of Single point of = failure. And are waiting for Apache to release the Hadoop 2.x.=A0

I was wondering, if I can somehow configure two Name No= des on the same Network with the same IP Address, but the second name node = is redirected only after the failure of the primary, that might help in aut= omatic resolution of this problem. all the slaves are connecting to the Nam= e Node with a network alias in their /etc/hosts file.=A0
I am trying to implement something like this in the cluster:
http://networksandservers.blogspot.de/2011/04/= failover-clustering-i.html

please suggest if this is possible.
Thanks for your time.
Regards,
Panshul.


On Mon, Jan 14, 2013 at 7:11 PM, <be= joy.hadoop@gmail.com> wrote:
ouchwhisper@gmail.com>
Date: Mon, 14 Jan 2013 19:04:24 -0800
Subject: Re: hadoop namenode recovery

<= /div>
thank you for the reply.

Is there = a way with which I can configure my cluster to switch to the Secondary Name= Node automatically in case of the Primary Name Node failure?
When I run my current Hadoop, I see the primary and secondary both Name nod= es running. I was wondering what is that Secondary Name Node for? and where= is it configured?
I was also wondering, is it possible to have t= wo or more Name nodes running in the same cluster?

Thanks,
Regards,
Panshul.


On Mon, Jan= 14, 2013 at 6:50 PM, <bejoy.hadoop@gmail.com> wrote:<= br>
Hi Panshul,

Usually for r= eliability there will be multiple dfs.name.dir configured. Of which one wou= ld be a remote location such as a nfs mount.
So that even if the NN machine crashes on a whole you still have the fs ima= ge and edit log in nfs mount. This can be utilized for reconstructing the = NN back again.


Regards
Bejoy KS

Sent from remote= device, Please excuse typos

From: Panshul Whisper <ouchwhisper@gmail.com>
Date: Mon, 14 Jan 2013 17:25:08 -0800
Subject: hadoop namenode recovery
Hello,

Is there a standard way t= o prevent the failure of Namenode crash in a Hadoop cluster?
or what is the standard or best practice for overcoming the Single poi= nt failure problem of Hadoop.

I am not ready to take chances on a production server w= ith Hadoop 2.0 Alpha release, which claims to have solved the problem. Are = there any other things I can do to either prevent the failure or recover fr= om the failure in a very short time.

Thanking You,

--
Regards,
Ouch Whisper
010101010101



-= -
Regards,
Ouch Whisper
010101010101



-= -
Regards,
Ouch Whisper
010101010101



--
=
Regards,
Ouch Whisper
010101010101



--
Thanks &= ; Regards,
Anil Gupta --047d7b343732c367ce04d34c4c89--