activemq-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From wazburrows <>
Subject Colocated fail-back not working correctly
Date Fri, 04 Oct 2019 21:45:53 GMT

I have a 4 node artemis 2.10 cluster on Linux configured for replication and
colocated HA servers. I have been testing the failover and fail back but its
not working as I would expect.  When shutting down one server (A) in an HA
pair the colocated backup on the second server (B) activate and process
messages for the original server A. Now it doesn't process all the messages
sent though but that's a different problem. The problems start when I bring
up the original server A again. Server A starts, becomes live and joins the
cluster but looking in the console there is no longer a collocated_backup_1
listing to show that it is providing a collocated backup to server B.  It
also seems to cause the server that was failed over to, server B, to go
offline and not be "live" anymore. Server B also doesn't have the
collocated_backup_1 shown in its console. Server B seems to be part of the
cluster still but in the UI there is no green master node shown for it
anymore - just a red slave node circle. Server B doesn't list any addresses
or acceptors in the UI and connections to it fail.  It seems like its
shutdown its "live" server and is running as a backup only.  If I shut
server B down and bring it up, the roles are swapped.  Now server B becomes
live again and is shown as a master node (still no collocated_backup_01
though) and server A goes offline and appears only as a slave node. in the
UI. Whether server A or B is in this "offline" backup-only state the value
of the Node property in the cluster attributes shown in the UI is the same
value for both. Prior to doing the failover test they have different node
ids which makes sense.
 So there is a problem with the fail back in that it seems to only allow one
of the nodes in the HA pair to ever be live after a failover event.

The only way to fix the issue with the pair is to stop both servers and
remove the "colocated" backup directories from the data directory on both
boxes before starting them again.  At which point they come up correctly and
both are live and they pair up as HA backups for each other again

Sent from:

View raw message