activemq-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From glstephen <glstep...@gmail.com>
Subject ActiveMQ cluster fails with "server null" when the Zookeeper master node goes offline
Date Fri, 18 Dec 2015 15:13:49 GMT
I have encountered an issue with ActiveMQ where the entire cluster will fail
when the master Zookeeper node goes offline.

We have a 3-node ActiveMQ cluster setup in our development environment. Each
node has ActiveMQ 5.12.0 and Zookeeper 3.4.6 (*note, we have done some
testing with Zookeeper 3.4.7, but this has failed to resolve the issue. Time
constraints have so far prevented us from testing ActiveMQ 5.13).

What we have found is that when we stop the master ZooKeeper process (via
the "end process tree" command in Task Manager), the remaining two ZooKeeper
nodes continue to function as normal. Sometimes the ActiveMQ cluster is able
to handle this, but sometimes it does not.

When the cluster fails, we typically see this in the ActiveMQ log:

2015-12-18 09:08:45,157 | WARN  | Too many cluster members are connected. 
Expected at most 3 members but there are 4 connected. |
org.apache.activemq.leveldb.replicated.MasterElector |
WrapperSimpleAppMain-EventThread
...
...
2015-12-18 09:27:09,722 | WARN  | Session 0x351b43b4a560016 for server null,
unexpected error, closing socket connection and attempting reconnect |
org.apache.zookeeper.ClientCnxn |
WrapperSimpleAppMain-SendThread(192.168.0.10:2181)
java.net.ConnectException: Connection refused: no further information
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)[:1.7.0_79]
	at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)[:1.7.0_79]
	at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)[zookeeper-3.4.6.jar:3.4.6-1569965]
	at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)[zookeeper-3.4.6.jar:3.4.6-1569965]
	
We were immediately concerned by the fact that (A)ActiveMQ seems to think
there are four members in the cluster when it is only configured with 3 and
(B) when the exception is raised, the server appears to be null. We then
increased ActiveMQ's logging level to DEBUG in order to display the list of
members:

2015-12-18 09:33:04,236 | DEBUG | ZooKeeper group changed: Map(localhost ->
ListBuffer((0000000156,{"id":"localhost","container":null,"address":null,"position":-1,"weight":5,"elected":null}),
(0000000157,{"id":"localhost","container":null,"address":null,"position":-1,"weight":1,"elected":null}),
(0000000158,{"id":"localhost","container":null,"address":"tcp://192.168.0.11:61619","position":-1,"weight":10,"elected":null}),
(0000000159,{"id":"localhost","container":null,"address":null,"position":-1,"weight":10,"elected":null})))
| org.apache.activemq.leveldb.replicated.MasterElector | ActiveMQ
BrokerService[localhost] Task-14

Can anyone suggest why this may be happening and/or suggest a way to resolve
this? Our configurations are shown below:

*ZooKeeper:*
tickTime=2000
dataDir=C:\\zookeeper-3.4.7\\data
clientPort=2181
initLimit=5
syncLimit=2
server.1=192.168.0.10:2888:3888
server.2=192.168.0.11:2888:3888
server.3=192.168.0.12:2888:3888

*ActiveMQ (server.1):*
<persistenceAdapter>    
    <replicatedLevelDB
        directory="activemq-data"
        replicas="3"
        bind="tcp://0.0.0.0:61619"
        zkAddress="192.168.0.11:2181,192.168.0.10:2181,192.168.0.12:2181"
        zkPath="/activemq/leveldb-stores"
        hostname="192.168.0.10"
        weight="5"/>
        //server.2 has a weight of 10, server.3 has a weight of 1
</persistenceAdapter>



--
View this message in context: http://activemq.2283324.n4.nabble.com/ActiveMQ-cluster-fails-with-server-null-when-the-Zookeeper-master-node-goes-offline-tp4705165.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Mime
View raw message