hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Azuryy <azury...@gmail.com>
Subject Re: HA NN Failover question
Date Sat, 15 Mar 2014 03:35:20 GMT
I suppose NN2 is standby, please check ZKFC2 is alive before stop network on nn1

Sent from my iPhone5s

> On 2014年3月15日, at 10:53, dlmarion <dlmarion@hotmail.com> wrote:
> 
> Apache Hadoop 2.3.0
> 
> 
> Sent via the Samsung GALAXY S®4, an AT&T 4G LTE smartphone
> 
> 
> -------- Original message --------
> From: Azuryy 
> Date:03/14/2014 10:45 PM (GMT-05:00) 
> To: user@hadoop.apache.org 
> Subject: Re: HA NN Failover question 
> 
> Which Hadoop version you used?
> 
> 
> Sent from my iPhone5s
> 
> On 2014年3月15日, at 9:29, dlmarion <dlmarion@hotmail.com> wrote:
> 
>> Server 1: NN1 and ZKFC1
>> Server 2: NN2 and ZKFC2
>> Server 3: Journal1 and ZK1
>> Server 4: Journal2 and ZK2
>> Server 5: Journal3 and ZK3
>> Server 6+: Datanode
>>  
>> All in the same rack. I would expect the ZKFC from the active name node server to
lose its lock and the other ZKFC to tell the standby namenode that it should become active
(I’m assuming that’s how it works).
>>  
>> - Dave
>>  
>> From: Juan Carlos [mailto:jucaf1@gmail.com] 
>> Sent: Friday, March 14, 2014 9:12 PM
>> To: user@hadoop.apache.org
>> Subject: Re: HA NN Failover question
>>  
>> Hi Dave,
>> How many zookeeper servers do you have and where are them? 
>> 
>> Juan Carlos Fernández Rodríguez
>> 
>> El 15/03/2014, a las 01:21, dlmarion <dlmarion@hotmail.com> escribió:
>> 
>> I was doing some testing with HA NN today. I set up two NN with active failover (ZKFC)
using sshfence. I tested that its working on both NN by doing ‘kill -9 <pid>’ on
the active NN. When I did this on the active node, the standby would become the active and
everything seemed to work. Next, I logged onto the active NN and did a ‘service network
stop’ to simulate a NIC/network failure. The standby did not become the active in this scenario.
In fact, it remained in standby mode and complained in the log that it could not communicate
with (what was) the active NN. I was unable to find anything relevant via searches in Google
in Jira. Does anyone have experience successfully testing this? I’m hoping that it is just
a configuration problem.
>>  
>> FWIW, when the network was restarted on the active NN, it failed over almost immediately.
>>  
>> Thanks,
>>  
>> Dave

Mime
View raw message