activemq-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Carlson <jcarl...@e-dialog.com>
Subject ActiveMQ Broker Failover
Date Thu, 11 Nov 2010 19:11:13 GMT
We are using version 5.3.0 with a shared file system master slave configuration and using persistence
messaging with client acknowledgements. A NFSV4 mount point is used for both the lock file
and the persistent storage. KahaDB is being used as the persistence adaptor.

We have encountered issues where the broker does not failover gracefully whenever there is
a problem with the NFS server. The most reliable test case I have come up with is starting
and stopping the NFS server. When the NFS server is restarted one of the slaves acquires the
lock and become a master, but the original master stays active and listening for connections.
Clients can successfully connect to it and subscribe to queues (but no messages get dispatched)
and enqueues hang until there is a timeout on the socket. Connections that go to the new master
work. Hence the questions:

	Why was the lock released? Shouldn't it have been retained?

       Why isn't the original master dispatching messages and blocking sends?

I have seen other issues but have not been able to reproduce them reliably,

	* NFS timeout due to a DNS issue
	* Possible Linux kernel bug. Problem arrises when /var/log/messages: kernel: decode_op_hdr:
reply buffer overflowed in line 2121.<6>      blocks= 585871964 block_size= 512

Any help would be appreciated.

Thanks

Josh


Mime
View raw message