hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dlmarion <dlmar...@hotmail.com>
Subject RE: HA NN Failover question
Date Sat, 15 Mar 2014 01:24:55 GMT
I don't think so. NN1 and ZKFC1 are one physically separate machines than
NN2 and ZKFC2.

 

From: Chris Mawata [mailto:chris.mawata@gmail.com] 
Sent: Friday, March 14, 2014 9:05 PM
To: user@hadoop.apache.org
Subject: Re: HA NN Failover question

 

Could you have also prevented the standby from communicating with Zookeeper?

Chris

On Mar 14, 2014 8:22 PM, "dlmarion" <dlmarion@hotmail.com> wrote:

I was doing some testing with HA NN today. I set up two NN with active
failover (ZKFC) using sshfence. I tested that its working on both NN by
doing 'kill -9 <pid>' on the active NN. When I did this on the active node,
the standby would become the active and everything seemed to work. Next, I
logged onto the active NN and did a 'service network stop' to simulate a
NIC/network failure. The standby did not become the active in this scenario.
In fact, it remained in standby mode and complained in the log that it could
not communicate with (what was) the active NN. I was unable to find anything
relevant via searches in Google in Jira. Does anyone have experience
successfully testing this? I'm hoping that it is just a configuration
problem.

 

FWIW, when the network was restarted on the active NN, it failed over almost
immediately.

 

Thanks,

 

Dave


Mime
View raw message