Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of quentin.ambard@gmail.com
 designates 209.85.214.176 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAOcnVr3MRQ3y-QvwjFC1yJh_ds4GtEu8dguh=wvgJ3GQZcNu-Q@mail.gmail.com>
References: 
 <CAKkKj_Bu991wOTZFj9guoE3Omhbg3hRmaU3VenhUvrXWGuaNJQ@mail.gmail.com>
 <CAOcnVr3MRQ3y-QvwjFC1yJh_ds4GtEu8dguh=wvgJ3GQZcNu-Q@mail.gmail.com>
From: Quentin Ambard <quentin.ambard@gmail.com>
Date: Thu, 22 Nov 2012 22:43:35 +0100
Message-ID: 
 <CAKkKj_BfhAPTHOh=EE+YGurnCOooCwaUyUQSj+vTdeihcvai-g@mail.gmail.com>
Subject: Re: changing ha failover auto conf value
To: user <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=e89a8fb1f3be1eda8204cf1c5db6

--e89a8fb1f3be1eda8204cf1c5db6
Content-Type: text/plain; charset=ISO-8859-1

Hi
Here is what i'm doing :

NN1 (active) + ZKFC1
NN2 (standby) + ZKFC2

First I stop the ZKFC1 service =>
NN1 (standby)
NN2 (active) + ZKFC2

Then I kill the active node : kill -9 on NN2 process

NN1 stay on standby

ZKFC2 log :

2012-11-22 22:23:40,073 INFO org.apache.hadoop.ha.ActiveStandbyElector:
Checking for any old active which needs to be fenced...
2012-11-22 22:23:40,081 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old
node exists:
0a096d79636c757374657212036e6e321a106e733233363833342e6f76682e6e657420d43e28d33e
2012-11-22 22:23:40,082 INFO org.apache.hadoop.ha.ZKFailoverController:
Should fence: NameNode at /nn2:8020
2012-11-22 22:23:40,205 INFO org.apache.hadoop.ha.ZKFailoverController:
Successfully transitioned NameNode at /nn2:8020 to standby state without
fencing
2012-11-22 22:23:40,205 INFO org.apache.hadoop.ha.ActiveStandbyElector:
Writing znode /hadoop-ha/mycluster/ActiveBreadCrumb to indicate that the
local node is the most recent active...
2012-11-22 22:23:40,233 INFO org.apache.hadoop.ha.ZKFailoverController:
Trying to make NameNode at xxxx/nn1:8020 active...
2012-11-22 22:23:40,605 INFO org.apache.hadoop.ha.ZKFailoverController:
Successfully transitioned NameNode at xxxx/nn1:8020 to active state
2012-11-22 22:24:14,073 WARN org.apache.hadoop.ha.HealthMonitor:
Transport-level exception trying to monitor health of NameNode at
xxxx/nn1:8020: Failed on local exception: java.io.IOException: Response is
null.; Host Details : local host is: "xxxx/nn1"; destination host is:
"xxxx":8020;
2012-11-22 22:24:14,074 INFO org.apache.hadoop.ha.HealthMonitor: Entering
state SERVICE_NOT_RESPONDING
2012-11-22 22:24:14,074 INFO org.apache.hadoop.ha.ZKFailoverController:
Local service NameNode at xxxx/nn1:8020 entered state:
SERVICE_NOT_RESPONDING
2012-11-22 22:24:14,074 INFO org.apache.hadoop.ha.ZKFailoverController:
Quitting master election for NameNode at xxxx/nn1:8020 and marking that
fencing is necessary
2012-11-22 22:24:14,074 INFO org.apache.hadoop.ha.ActiveStandbyElector:
Yielding from election
2012-11-22 22:24:14,128 INFO org.apache.zookeeper.ZooKeeper: Session:
0x23b29574aed0014 closed
2012-11-22 22:24:14,128 WARN org.apache.hadoop.ha.ActiveStandbyElector:
Ignoring stale result from old client with sessionId 0x23b29574aed0014
2012-11-22 22:24:14,128 INFO org.apache.zookeeper.ClientCnxn: EventThread
shut down
2012-11-22 22:24:16,129 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: xxxx/nn1:8020. Already tried 0 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 SECONDS)
2012-11-22 22:24:16,130 WARN org.apache.hadoop.ha.HealthMonitor:
Transport-level exception trying to monitor health of NameNode at
xxxx/nn1:8020: Call From xxxx/nn1 to xxxx:8020 failed on connection
exception: java.net.ConnectException: Connection refused; For more details
see:  http://wiki.apache.org/hadoop/ConnectionRefused
2012-11-22 22:24:18,131 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: xxxx/nn1:8020. Already tried 0 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 SECONDS)
2012-11-22 22:24:18,131 WARN org.apache.hadoop.ha.HealthMonitor:
Transport-level exception trying to monitor health of NameNode at
xxxx/nn1:8020: Call From xxxx/nn1 to xxxx:8020 failed on connection
exception: java.net.ConnectException: Connection refused; For more details
see:  http://wiki.apache.org/hadoop/ConnectionRefused
2012-11-22 22:24:20,133 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: xxxx/nn1:8020. Already tried 0 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 SECONDS)
2012-11-22 22:24:20,133 WARN org.apache.hadoop.ha.HealthMonitor:
Transport-level exception trying to monitor health of NameNode at
xxxx/nn1:8020: Call From xxxx/nn1 to xxxx:8020 failed on connection
exception: java.net.ConnectException: Connection refused; For more details
see:  http://wiki.apache.org/hadoop/ConnectionRefused
2012-11-22 22:24:22,135 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: xxxx/nn1:8020. Already tried 0 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 SECONDS)
2012-11-22 22:24:22,136 WARN org.apache.hadoop.ha.HealthMonitor:
Transport-level exception trying to monitor health of NameNode at
xxxx/nn1:8020: Call From xxxx/nn1 to xxxx:8020 failed on connection
exception: java.net.ConnectException: Connection refused; For more details
see:  http://wiki.apache.org/hadoop/ConnectionRefused
...


NN1 logs :
2012-11-22 22:23:40,109 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services
started for active state
2012-11-22 22:23:40,109 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 166
2012-11-22 22:23:40,110 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
Total time for transactions(ms): 0Number of transactions batched in Syncs:
0 Number of syncs: 1 SyncTimes(ms): 32 125
2012-11-22 22:23:40,182 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
Total time for transactions(ms): 0Number of transactions batched in Syncs:
0 Number of syncs: 2 SyncTimes(ms): 85 144
2012-11-22 22:23:40,196 INFO
org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits
file /home/hdfs/dfs/name/current/edits_inprogress_0000000000000000166 ->
/home/hdfs/dfs/name/current/edits_0000000000000000166-0000000000000000167
2012-11-22 22:23:40,196 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services
required for standby state
2012-11-22 22:23:40,198 INFO
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Will roll logs on
active node at /nn2:8020 every 120 seconds.
2012-11-22 22:23:40,199 INFO
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Starting
standby checkpoint thread...
Checkpointing active NN at nn2:50070
Serving checkpoints at xxxx/nn1:50070
2012-11-22 22:25:40,235 INFO
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Triggering log
roll on remote NameNode /nn2:8020
2012-11-22 22:25:41,248 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: xxxx/nn2:8020. Already tried 0 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2012-11-22 22:25:42,258 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: xxxx/nn2:8020. Already tried 1 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2012-11-22 22:25:43,268 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: xxxx/nn2:8020. Already tried 2 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2012-11-22 22:25:44,279 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: xxxx/nn2:8020. Already tried 3 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2012-11-22 22:25:45,289 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: xxxx/nn2:8020. Already tried 4 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2012-11-22 22:25:46,300 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: xxxx/nn2:8020. Already tried 5 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2012-11-22 22:25:47,310 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: xxxx/nn2:8020. Already tried 6 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
...

Thanks for your help

2012/11/22 Harsh J <harsh@cloudera.com>

> Hi,
>
> Losing a complete node (ZKFC plus NN) with a journal node (QJM)
> configuration shouldn't be causing automatic failover to fail. Could
> you post up both your NameNode and ZKFC logs somewhere we can take a
> look?
>
> On Fri, Nov 23, 2012 at 12:41 AM, Quentin Ambard
> <quentin.ambard@gmail.com> wrote:
> > Hello,
> > I have 2 namenodes in ha mode, running with 3 journal node, 3 zookeeper
> > servers and 2 zkfc (one with each namenode)
> >
> > If a server with the activated namenode and a zkfc get both down, the
> single
> > instance of zkfc can't activate the standby namenode.
> >
> > So I end with a single namenode in standby mode.
> > I can try to activate it with the following :
> > hdfs haadmin -transitionToActive nn1 --forcemanual
> >
> > But it's recommended to disable the automatic failover to avoid
> split-brain.
> > To do so, i stop all my namenode and set the
> > dfs.ha.automatic-failover.enabled property to false.
> >
> > However, restarting the namenode doesn't change this configuration, i'm
> > still getting the same warning while trying to activate the namenode.
> >
> > How can I change this configuration value ?
> >
> > Do I really need to have 3 namenode to avoid this situation (namenode
> > manually activation), or can I achieve a full-auto conf with only 2
> namenode
> > ?
> >
> >
> > Thanks for your help
> >
> >
> > --
> > Quentin Ambard
>
>
>
> --
> Harsh J
>


-- 
Quentin Ambard

--e89a8fb1f3be1eda8204cf1c5db6
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi<div>Here is what i&#39;m doing :</div><div><br><div>NN1 (active) + ZKFC1=
</div><div>NN2 (standby) + ZKFC2</div><div class=3D"gmail_extra"><br></div>=
<div class=3D"gmail_extra">First I stop the=A0ZKFC1 service =3D&gt;</div><d=
iv class=3D"gmail_extra">

NN1 (standby)</div><div class=3D"gmail_extra">NN2 (active)=A0+ ZKFC2</div><=
div class=3D"gmail_extra"><br></div><div class=3D"gmail_extra">Then I kill =
the active node :=A0kill -9 on NN2 process</div><div class=3D"gmail_extra">=
<br></div>

<div class=3D"gmail_extra">NN1 stay on standby</div><div class=3D"gmail_ext=
ra"><br></div><div class=3D"gmail_extra">ZKFC2 log :<br></div><div class=3D=
"gmail_extra"><br></div><div class=3D"gmail_extra"><div class=3D"gmail_extr=
a"><div class=3D"gmail_extra">

2012-11-22 22:23:40,073 INFO org.apache.hadoop.ha.ActiveStandbyElector: Che=
cking for any old active which needs to be fenced...</div><div class=3D"gma=
il_extra">2012-11-22 22:23:40,081 INFO org.apache.hadoop.ha.ActiveStandbyEl=
ector: Old node exists: 0a096d79636c757374657212036e6e321a106e7332333638333=
42e6f76682e6e657420d43e28d33e</div>

<div class=3D"gmail_extra">2012-11-22 22:23:40,082 INFO org.apache.hadoop.h=
a.ZKFailoverController: Should fence: NameNode at /nn2:8020</div><div class=
=3D"gmail_extra">2012-11-22 22:23:40,205 INFO org.apache.hadoop.ha.ZKFailov=
erController: Successfully transitioned NameNode at /nn2:8020 to standby st=
ate without fencing</div>

<div class=3D"gmail_extra">2012-11-22 22:23:40,205 INFO org.apache.hadoop.h=
a.ActiveStandbyElector: Writing znode /hadoop-ha/mycluster/ActiveBreadCrumb=
 to indicate that the local node is the most recent active...</div><div cla=
ss=3D"gmail_extra">

2012-11-22 22:23:40,233 INFO org.apache.hadoop.ha.ZKFailoverController: Try=
ing to make NameNode at xxxx/nn1:8020 active...</div><div class=3D"gmail_ex=
tra">2012-11-22 22:23:40,605 INFO org.apache.hadoop.ha.ZKFailoverController=
: Successfully transitioned NameNode at xxxx/nn1:8020 to active state</div>

<div class=3D"gmail_extra">2012-11-22 22:24:14,073 WARN org.apache.hadoop.h=
a.HealthMonitor: Transport-level exception trying to monitor health of Name=
Node at xxxx/nn1:8020: Failed on local exception: java.io.IOException: Resp=
onse is null.; Host Details : local host is: &quot;xxxx/nn1&quot;; destinat=
ion host is: &quot;xxxx&quot;:8020;=A0</div>

<div class=3D"gmail_extra">2012-11-22 22:24:14,074 INFO org.apache.hadoop.h=
a.HealthMonitor: Entering state SERVICE_NOT_RESPONDING</div><div class=3D"g=
mail_extra">2012-11-22 22:24:14,074 INFO org.apache.hadoop.ha.ZKFailoverCon=
troller: Local service NameNode at xxxx/nn1:8020 entered state: SERVICE_NOT=
_RESPONDING</div>

<div class=3D"gmail_extra">2012-11-22 22:24:14,074 INFO org.apache.hadoop.h=
a.ZKFailoverController: Quitting master election for NameNode at xxxx/nn1:8=
020 and marking that fencing is necessary</div><div class=3D"gmail_extra">
2012-11-22 22:24:14,074 INFO org.apache.hadoop.ha.ActiveStandbyElector: Yie=
lding from election</div>
<div class=3D"gmail_extra">2012-11-22 22:24:14,128 INFO org.apache.zookeepe=
r.ZooKeeper: Session: 0x23b29574aed0014 closed</div><div class=3D"gmail_ext=
ra">2012-11-22 22:24:14,128 WARN org.apache.hadoop.ha.ActiveStandbyElector:=
 Ignoring stale result from old client with sessionId 0x23b29574aed0014</di=
v>

<div class=3D"gmail_extra">2012-11-22 22:24:14,128 INFO org.apache.zookeepe=
r.ClientCnxn: EventThread shut down</div><div class=3D"gmail_extra">2012-11=
-22 22:24:16,129 INFO org.apache.hadoop.ipc.Client: Retrying connect to ser=
ver: xxxx/nn1:8020. Already tried 0 time(s); retry policy is RetryUpToMaxim=
umCountWithFixedSleep(maxRetries=3D1, sleepTime=3D1 SECONDS)</div>

<div class=3D"gmail_extra">2012-11-22 22:24:16,130 WARN org.apache.hadoop.h=
a.HealthMonitor: Transport-level exception trying to monitor health of Name=
Node at xxxx/nn1:8020: Call From xxxx/nn1 to xxxx:8020 failed on connection=
 exception: java.net.ConnectException: Connection refused; For more details=
 see: =A0<a href=3D"http://wiki.apache.org/hadoop/ConnectionRefused">http:/=
/wiki.apache.org/hadoop/ConnectionRefused</a></div>

<div class=3D"gmail_extra">2012-11-22 22:24:18,131 INFO org.apache.hadoop.i=
pc.Client: Retrying connect to server: xxxx/nn1:8020. Already tried 0 time(=
s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3D1, sle=
epTime=3D1 SECONDS)</div>

<div class=3D"gmail_extra">2012-11-22 22:24:18,131 WARN org.apache.hadoop.h=
a.HealthMonitor: Transport-level exception trying to monitor health of Name=
Node at xxxx/nn1:8020: Call From xxxx/nn1 to xxxx:8020 failed on connection=
 exception: java.net.ConnectException: Connection refused; For more details=
 see: =A0<a href=3D"http://wiki.apache.org/hadoop/ConnectionRefused">http:/=
/wiki.apache.org/hadoop/ConnectionRefused</a></div>

<div class=3D"gmail_extra">2012-11-22 22:24:20,133 INFO org.apache.hadoop.i=
pc.Client: Retrying connect to server: xxxx/nn1:8020. Already tried 0 time(=
s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3D1, sle=
epTime=3D1 SECONDS)</div>

<div class=3D"gmail_extra">2012-11-22 22:24:20,133 WARN org.apache.hadoop.h=
a.HealthMonitor: Transport-level exception trying to monitor health of Name=
Node at xxxx/nn1:8020: Call From xxxx/nn1 to xxxx:8020 failed on connection=
 exception: java.net.ConnectException: Connection refused; For more details=
 see: =A0<a href=3D"http://wiki.apache.org/hadoop/ConnectionRefused">http:/=
/wiki.apache.org/hadoop/ConnectionRefused</a></div>

<div class=3D"gmail_extra">2012-11-22 22:24:22,135 INFO org.apache.hadoop.i=
pc.Client: Retrying connect to server: xxxx/nn1:8020. Already tried 0 time(=
s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3D1, sle=
epTime=3D1 SECONDS)</div>

<div class=3D"gmail_extra">2012-11-22 22:24:22,136 WARN org.apache.hadoop.h=
a.HealthMonitor: Transport-level exception trying to monitor health of Name=
Node at xxxx/nn1:8020: Call From xxxx/nn1 to xxxx:8020 failed on connection=
 exception: java.net.ConnectException: Connection refused; For more details=
 see: =A0<a href=3D"http://wiki.apache.org/hadoop/ConnectionRefused">http:/=
/wiki.apache.org/hadoop/ConnectionRefused</a></div>

<div class=3D"gmail_extra">...</div></div></div><div class=3D"gmail_extra">=
<br></div><div class=3D"gmail_extra"><br></div><div class=3D"gmail_extra">N=
N1 logs :</div><div class=3D"gmail_extra"><div class=3D"gmail_extra">2012-1=
1-22 22:23:40,109 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem:=
 Stopping services started for active state</div>

<div class=3D"gmail_extra">2012-11-22 22:23:40,109 INFO org.apache.hadoop.h=
dfs.server.namenode.FSEditLog: Ending log segment 166</div><div class=3D"gm=
ail_extra">2012-11-22 22:23:40,110 INFO org.apache.hadoop.hdfs.server.namen=
ode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0=
Number of transactions batched in Syncs: 0 Number of syncs: 1 SyncTimes(ms)=
: 32 125=A0</div>

<div class=3D"gmail_extra">2012-11-22 22:23:40,182 INFO org.apache.hadoop.h=
dfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for tra=
nsactions(ms): 0Number of transactions batched in Syncs: 0 Number of syncs:=
 2 SyncTimes(ms): 85 144=A0</div>

<div class=3D"gmail_extra">2012-11-22 22:23:40,196 INFO org.apache.hadoop.h=
dfs.server.namenode.FileJournalManager: Finalizing edits file /home/hdfs/df=
s/name/current/edits_inprogress_0000000000000000166 -&gt; /home/hdfs/dfs/na=
me/current/edits_0000000000000000166-0000000000000000167</div>

<div class=3D"gmail_extra">2012-11-22 22:23:40,196 INFO org.apache.hadoop.h=
dfs.server.namenode.FSNamesystem: Starting services required for standby st=
ate</div><div class=3D"gmail_extra">2012-11-22 22:23:40,198 INFO org.apache=
.hadoop.hdfs.server.namenode.ha.EditLogTailer: Will roll logs on active nod=
e at /nn2:8020 every 120 seconds.</div>

<div class=3D"gmail_extra">2012-11-22 22:23:40,199 INFO org.apache.hadoop.h=
dfs.server.namenode.ha.StandbyCheckpointer: Starting standby checkpoint thr=
ead...</div><div class=3D"gmail_extra">Checkpointing active NN at nn2:50070=
</div>

<div class=3D"gmail_extra">Serving checkpoints at xxxx/nn1:50070</div><div =
class=3D"gmail_extra">2012-11-22 22:25:40,235 INFO org.apache.hadoop.hdfs.s=
erver.namenode.ha.EditLogTailer: Triggering log roll on remote NameNode /nn=
2:8020</div>

<div class=3D"gmail_extra">2012-11-22 22:25:41,248 INFO org.apache.hadoop.i=
pc.Client: Retrying connect to server: xxxx/nn2:8020. Already tried 0 time(=
s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3D10, sl=
eepTime=3D1 SECONDS)</div>

<div class=3D"gmail_extra">2012-11-22 22:25:42,258 INFO org.apache.hadoop.i=
pc.Client: Retrying connect to server: xxxx/nn2:8020. Already tried 1 time(=
s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3D10, sl=
eepTime=3D1 SECONDS)</div>

<div class=3D"gmail_extra">2012-11-22 22:25:43,268 INFO org.apache.hadoop.i=
pc.Client: Retrying connect to server: xxxx/nn2:8020. Already tried 2 time(=
s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3D10, sl=
eepTime=3D1 SECONDS)</div>

<div class=3D"gmail_extra">2012-11-22 22:25:44,279 INFO org.apache.hadoop.i=
pc.Client: Retrying connect to server: xxxx/nn2:8020. Already tried 3 time(=
s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3D10, sl=
eepTime=3D1 SECONDS)</div>

<div class=3D"gmail_extra">2012-11-22 22:25:45,289 INFO org.apache.hadoop.i=
pc.Client: Retrying connect to server: xxxx/nn2:8020. Already tried 4 time(=
s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3D10, sl=
eepTime=3D1 SECONDS)</div>

<div class=3D"gmail_extra">2012-11-22 22:25:46,300 INFO org.apache.hadoop.i=
pc.Client: Retrying connect to server: xxxx/nn2:8020. Already tried 5 time(=
s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3D10, sl=
eepTime=3D1 SECONDS)</div>

<div class=3D"gmail_extra">2012-11-22 22:25:47,310 INFO org.apache.hadoop.i=
pc.Client: Retrying connect to server: xxxx/nn2:8020. Already tried 6 time(=
s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3D10, sl=
eepTime=3D1 SECONDS)</div>

<div class=3D"gmail_extra">...</div></div><div class=3D"gmail_extra"><br>Th=
anks for your help<br></div><div class=3D"gmail_extra"><br><div class=3D"gm=
ail_quote">2012/11/22 Harsh J <span dir=3D"ltr">&lt;<a href=3D"mailto:harsh=
@cloudera.com" target=3D"_blank">harsh@cloudera.com</a>&gt;</span><br>

<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex">Hi,<br>
<br>
Losing a complete node (ZKFC plus NN) with a journal node (QJM)<br>
configuration shouldn&#39;t be causing automatic failover to fail. Could<br=
>
you post up both your NameNode and ZKFC logs somewhere we can take a<br>
look?<br>
<div class=3D""><div class=3D"h5"><br>
On Fri, Nov 23, 2012 at 12:41 AM, Quentin Ambard<br>
&lt;<a href=3D"mailto:quentin.ambard@gmail.com">quentin.ambard@gmail.com</a=
>&gt; wrote:<br>
&gt; Hello,<br>
&gt; I have 2 namenodes in ha mode, running with 3 journal node, 3 zookeepe=
r<br>
&gt; servers and 2 zkfc (one with each namenode)<br>
&gt;<br>
&gt; If a server with the activated namenode and a zkfc get both down, the =
single<br>
&gt; instance of zkfc can&#39;t activate the standby namenode.<br>
&gt;<br>
&gt; So I end with a single namenode in standby mode.<br>
&gt; I can try to activate it with the following :<br>
&gt; hdfs haadmin -transitionToActive nn1 --forcemanual<br>
&gt;<br>
&gt; But it&#39;s recommended to disable the automatic failover to avoid sp=
lit-brain.<br>
&gt; To do so, i stop all my namenode and set the<br>
&gt; dfs.ha.automatic-failover.enabled property to false.<br>
&gt;<br>
&gt; However, restarting the namenode doesn&#39;t change this configuration=
, i&#39;m<br>
&gt; still getting the same warning while trying to activate the namenode.<=
br>
&gt;<br>
&gt; How can I change this configuration value ?<br>
&gt;<br>
&gt; Do I really need to have 3 namenode to avoid this situation (namenode<=
br>
&gt; manually activation), or can I achieve a full-auto conf with only 2 na=
menode<br>
&gt; ?<br>
&gt;<br>
&gt;<br>
&gt; Thanks for your help<br>
&gt;<br>
&gt;<br>
&gt; --<br>
&gt; Quentin Ambard<br>
<br>
<br>
<br>
</div></div><span class=3D""><font color=3D"#888888">--<br>
Harsh J<br>
</font></span></blockquote></div><br><br clear=3D"all"><div><br></div>-- <b=
r>Quentin Ambard<br>
</div></div>

--e89a8fb1f3be1eda8204cf1c5db6--