activemq-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From boris_snp <boris.godu...@spglobal.com>
Subject Cluster, both brokers are "live"
Date Fri, 22 Sep 2017 15:03:55 GMT
I have to restart my 2-broker cluster on daily basis due to the following
sequence of events:
----------------------------------------------------------------------------------
master
04:51:14,501	AMQ212037: Connection failure has been detected: AMQ119014: Did
not receive data from /10.202.147.99:58739 within the 60,000ms connection
TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
04:51:14,510	AMQ222092: Connection to the backup node failed, removing
replication now:
ActiveMQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT
message=AMQ119014: Did not receive data from /10.202.147.99:58739 within the
60,000ms connection TTL. The connection will now be closed.]
04:51:24,517	AMQ212041: Timed out waiting for netty channel to close
04:51:24,517	AMQ212037: Connection failure has been detected: AMQ119014: Did
not receive data from /10.202.147.99:58738 within the 60,000ms connection
TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
----------------------------------------------------------------------------------
slave
04:51:42,306	
AMQ212037: Connection failure has been detected: AMQ119011: Did not receive
data from server for
org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnection@1c54a4bc[local=
/10.202.147.99:58738, remote=nj09mhf0681/10.202.147.99:41410]
[code=CONNECTION_TIMEDOUT]
04:51:42,316	
AMQ212037: Connection failure has been detected: AMQ119011: Did not receive
data from server for
org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnection@65ace922[local=
/10.202.147.99:58739, remote=nj09mhf0681/10.202.147.99:41410]
[code=CONNECTION_TIMEDOUT]
04:51:46,955	AMQ221037:
ActiveMQServerImpl::serverUUID=7ffa29a0-7c48-11e7-9784-e83935127b09 to
become 'live'
04:51:59,360	AMQ221014: 40% loaded
04:52:01,854	AMQ221014: 81% loaded
04:52:03,037	AMQ222028: Could not find page cache for page PagePositionImpl
[pageNr=8, messageNr=-1, recordID=8662153341] removing it from the journal
04:52:03,051	AMQ222028: Could not find page cache for page PagePositionImpl
[pageNr=13, messageNr=-1, recordID=8662204094] removing it from the journal
04:52:03,208	AMQ221003: Deploying queue jms.queue.DLQ
04:52:03,281	AMQ221003: Deploying queue jms.queue.ExpiryQueue
04:52:03,827	AMQ212034: There are more than one servers on the network
broadcasting the same node id.
----------------------------------------------------------------------------------
master
04:52:03,827	AMQ212034: There are more than one servers on the network
broadcasting the same node id.
----------------------------------------------------------------------------------
slave
04:52:03,910	AMQ221007: Server is now live
04:52:04,003	AMQ221020: Started Acceptor at nj09mhf0681:41411 for protocols
[CORE,MQTT,AMQP,STOMP,HORNETQ,OPENWIRE]
04:52:11,949	AMQ212034: There are more than one servers on the network
broadcasting the same node id.
----------------------------------------------------------------------------------
I understand that at some point master (live now) loses slave and closes
connection to it. Slave (backup now) in turn detects that no master is
available and becomes live itself. Now both brokers are live and never
recover from such state.
How can I avoid restarts and have brokers recover to usable state by
themselves?
Thank you.




--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html

Mime
View raw message