activemq-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael André Pearce <michael.andre.pea...@me.com>
Subject Re: 2 broker clusetr, both brokers are live
Date Fri, 22 Sep 2017 18:45:04 GMT
Also I am assuming you have checked already that the master is not GC’ing and having a large
pause due to gc or something like that. 

Sent from my iPhone

> On 22 Sep 2017, at 19:43, Michael André Pearce <michael.andre.pearce@me.com> wrote:
> 
> https://activemq.apache.org/artemis/docs/latest/network-isolation.html
> 
> Sent from my iPhone
> 
>> On 22 Sep 2017, at 19:41, Michael André Pearce <michael.andre.pearce@me.com>
wrote:
>> 
>> I am assuming you had possibly a temp network fault meaning the slave and master
could not talk.
>> 
>> Have you configured network pinger? If / when you have network issues possibly causing
a split brain (master and slave cannot talk to each other) then the nodes also ping another
device on the network with the idea one would fail, and thus help avoid the issue of this
split brain scenario.
>> 
>> 
>> Cheers
>> Mike 
>> 
>> 
>> Sent from my iPhone
>> 
>>> On 22 Sep 2017, at 17:49, boris_snp <boris.godunov@spglobal.com> wrote:
>>> 
>>> I have to restart my 2 broker cluster on a daily basis due to the following
>>> sequence of events:
>>> -----------------------------------------------------------------------------------------------
>>> master
>>> 04:51:14,501    AMQ212037: Connection failure has been detected: AMQ119014: Did
>>> not receive data from /10.202.147.99:58739 within the 60,000ms connection
>>> TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
>>> 04:51:14,510    AMQ222092: Connection to the backup node failed, removing
>>> replication now:
>>> ActiveMQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT
>>> message=AMQ119014: Did not receive data from /10.202.147.99:58739 within the
>>> 60,000ms connection TTL. The connection will now be closed.]
>>> 04:51:24,517    AMQ212041: Timed out waiting for netty channel to close
>>> 04:51:24,517    AMQ212037: Connection failure has been detected: AMQ119014: Did
>>> not receive data from /10.202.147.99:58738 within the 60,000ms connection
>>> TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
>>> -----------------------------------------------------------------------------------------------
>>> slave
>>> 04:51:42,306    
>>> AMQ212037: Connection failure has been detected: AMQ119011: Did not receive
>>> data from server for
>>> org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnection@1c54a4bc[local=
>>> /10.202.147.99:58738, remote=nj09mhf0681/10.202.147.99:41410]
>>> [code=CONNECTION_TIMEDOUT]
>>> 04:51:42,316    
>>> AMQ212037: Connection failure has been detected: AMQ119011: Did not receive
>>> data from server for
>>> org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnection@65ace922[local=
>>> /10.202.147.99:58739, remote=nj09mhf0681/10.202.147.99:41410]
>>> [code=CONNECTION_TIMEDOUT]
>>> 04:51:46,955    AMQ221037:
>>> ActiveMQServerImpl::serverUUID=7ffa29a0-7c48-11e7-9784-e83935127b09 to
>>> become 'live'
>>> 04:51:59,360    AMQ221014: 40% loaded
>>> 04:52:01,854    AMQ221014: 81% loaded
>>> 04:52:03,037    AMQ222028: Could not find page cache for page PagePositionImpl
>>> [pageNr=8, messageNr=-1, recordID=8662153341] removing it from the journal
>>> 04:52:03,051    AMQ222028: Could not find page cache for page PagePositionImpl
>>> [pageNr=13, messageNr=-1, recordID=8662204094] removing it from the journal
>>> 04:52:03,208    AMQ221003: Deploying queue jms.queue.DLQ
>>> 04:52:03,281    AMQ221003: Deploying queue jms.queue.ExpiryQueue
>>> 04:52:03,827    AMQ212034: There are more than one servers on the network
>>> broadcasting the same node id.
>>> -----------------------------------------------------------------------------------------------
>>> master
>>> 04:52:03,827    AMQ212034: There are more than one servers on the network
>>> broadcasting the same node id.
>>> -----------------------------------------------------------------------------------------------
>>> slave
>>> 04:52:03,910    AMQ221007: Server is now live
>>> 04:52:04,003    AMQ221020: Started Acceptor at nj09mhf0681:41411 for protocols
>>> [CORE,MQTT,AMQP,STOMP,HORNETQ,OPENWIRE]
>>> 04:52:11,949    AMQ212034: There are more than one servers on the network
>>> broadcasting the same node id.
>>> -----------------------------------------------------------------------------------------------
>>> I understand that at some point master (now live) loses slave and closes
>>> connection to it.
>>> Slave (backup now) in turn detects that master is not present and becomes
>>> live. Now both brokers are live and never recover to normal until restart.
>>> How can I avois this? Will appreciate any help.
>>> Thank you.
>>> 
>>> 
>>> 
>>> --
>>> Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html

Mime
View raw message