Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7A030101F9 for ; Sat, 15 Mar 2014 02:53:43 +0000 (UTC) Received: (qmail 58039 invoked by uid 500); 15 Mar 2014 02:53:35 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 57841 invoked by uid 500); 15 Mar 2014 02:53:34 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 57834 invoked by uid 99); 15 Mar 2014 02:53:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 15 Mar 2014 02:53:34 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dlmarion@hotmail.com designates 65.54.190.87 as permitted sender) Received: from [65.54.190.87] (HELO bay0-omc2-s12.bay0.hotmail.com) (65.54.190.87) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 15 Mar 2014 02:53:27 +0000 Received: from BAY403-EAS51 ([65.54.190.124]) by bay0-omc2-s12.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Fri, 14 Mar 2014 19:53:05 -0700 X-TMN: [t8v+jGWCg5eROce3QlyNz6AE8lhbX2RT] X-Originating-Email: [dlmarion@hotmail.com] Message-ID: Date: Fri, 14 Mar 2014 22:53:03 -0400 Subject: RE: HA NN Failover question Importance: normal From: dlmarion To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="--_com.android.email_23460460400430" X-OriginalArrivalTime: 15 Mar 2014 02:53:05.0582 (UTC) FILETIME=[B0648CE0:01CF3FF9] X-Virus-Checked: Checked by ClamAV on apache.org ----_com.android.email_23460460400430 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Apache Hadoop 2.3.0 Sent via the Samsung GALAXY S=C2=AE4=2C an AT&T 4G LTE smartphone -------- Original message -------- From: Azuryy Date:03/14/2014 10:45 PM (GMT-05:00) To: user@hadoop.apache.org Subject: Re: HA NN Failover question Which Hadoop version you used? Sent from my iPhone5s > On 2014=E5=B9=B43=E6=9C=8815=E6=97=A5=2C at 9:29=2C dlmarion wrote: > > Server 1: NN1 and ZKFC1 > Server 2: NN2 and ZKFC2 > Server 3: Journal1 and ZK1 > Server 4: Journal2 and ZK2 > Server 5: Journal3 and ZK3 > Server 6+: Datanode > > All in the same rack. I would expect the ZKFC from the active name node s= erver to lose its lock and the other ZKFC to tell the standby namenode that= it should become active (I=E2=80=99m assuming that=E2=80=99s how it works)= . > > - Dave > > From: Juan Carlos [mailto:jucaf1@gmail.com] > Sent: Friday=2C March 14=2C 2014 9:12 PM > To: user@hadoop.apache.org > Subject: Re: HA NN Failover question > > Hi Dave=2C > How many zookeeper servers do you have and where are them? > > Juan Carlos Fern=C3=A1ndez Rodr=C3=ADguez > > El 15/03/2014=2C a las 01:21=2C dlmarion escribi= =C3=B3: > > I was doing some testing with HA NN today. I set up two NN with active fa= ilover (ZKFC) using sshfence. I tested that its working on both NN by doing= =E2=80=98kill -9 =E2=80=99 on the active NN. When I did this on the a= ctive node=2C the standby would become the active and everything seemed to = work. Next=2C I logged onto the active NN and did a =E2=80=98service networ= k stop=E2=80=99 to simulate a NIC/network failure. The standby did not beco= me the active in this scenario. In fact=2C it remained in standby mode and = complained in the log that it could not communicate with (what was) the act= ive NN. I was unable to find anything relevant via searches in Google in Ji= ra. Does anyone have experience successfully testing this? I=E2=80=99m hopi= ng that it is just a configuration problem. > > FWIW=2C when the network was restarted on the active NN=2C it failed over= almost immediately. > > Thanks=2C > > Dave ----_com.android.email_23460460400430 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset="utf-8"
Apache Hadoop 2.3.0


Sent via the Samsung GALAXY S= =C2=AE4=2C an AT&=3BT 4G LTE smartphone


-------- Original message --------
From: Azuryy
Date:03/14/2014 10:45 PM (GMT-05:00)
To: user@hadoop.apache.org
Subject: Re: HA NN Failover question

Which Hadoop version you used?


Sent from my iPhone5s

On 2014=E5=B9=B43=E6=9C=8815=E6=97=A5=2C at 9:29=2C dlmarion <=3Bdlmarion@hotmail.com>=3B wrote:

Server 1: NN1 and ZK= FC1

Server 2: NN2 and ZK= FC2

Server 3: Journal1 a= nd ZK1

Server 4: Journal2 a= nd ZK2

Server 5: Journal3 a= nd ZK3

Server 6+=3B: Dat= anode

 =3B

All in the same rack= . I would expect the ZKFC from the active name node server to lose its lock= and the other ZKFC to tell the standby namenode that it should become acti= ve (I=E2=80=99m assuming that=E2=80=99s how it works).

 =3B

- Dave

 =3B

From: Juan Carlos [mailto= :jucaf1@gmail.com]
Sent: Friday=2C March 14=2C 2014 9:12 PM
To: user@hadoop.apache.org=
Subject: Re: HA NN Failover question

 =3B

Hi Dave=2C

How many zookeeper servers do you have and where a= re them? =3B


Juan Carlos Fern=C3=A1ndez Rodr=C3=ADguez


El 15/03/2014=2C a las 01:21=2C dlmarion <=3Bdlmarion@hotmail.com>=3B escribi=C3=B3:

I was doing some testing with HA NN today. I set u= p two NN with active failover (ZKFC) using sshfence. I tested that its work= ing on both NN by doing =E2=80=98kill -9 <=3Bpid>=3B=E2=80=99 on the ac= tive NN. When I did this on the active node=2C the standby would become the active and everything seemed to work. Next=2C I logged onto the= active NN and did a =E2=80=98service network stop=E2=80=99 to simulate a N= IC/network failure. The standby did not become the active in this scenario.= In fact=2C it remained in standby mode and complained in the log that it could not communicate with (what was) the active NN. I = was unable to find anything relevant via searches in Google in Jira. Does a= nyone have experience successfully testing this? I=E2=80=99m hoping that it= is just a configuration problem.

 =3B

FWIW=2C when the network was restarted on the acti= ve NN=2C it failed over almost immediately.

 =3B

Thanks=2C

 =3B

Dave

----_com.android.email_23460460400430--