activemq-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ilkka Virolainen <Ilkka.Virolai...@bitwise.fi>
Subject RE: Artemis 2.5.0 - Problems with colocated scaledown
Date Wed, 21 Mar 2018 06:47:13 GMT
It looks like the issues were related to Artemis somehow not always having a complete cluster
topology after a sequence of shutdown/scaledown and failback. I changed the cluster connections
to use udp discovery/broadcast groups instead of static tcp connectors. This seems to have
been a workaround for the underlying issue.

- Ilkka

-----Original Message-----
From: Ilkka Virolainen <Ilkka.Virolainen@bitwise.fi> 
Sent: 14. maaliskuuta 2018 11:08
To: users@activemq.apache.org
Subject: RE: Artemis 2.5.0 - Problems with colocated scaledown

Excluding tcp-connectors and leaving invm-connectors to the ha-policy I'm seeing the following
behavior after server0 has been shutdown and restarted:

Server0 logs in an infinite loop:

...
2018-03-14 11:04:56,976 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating
quorum vote: RequestBackupQuorumVote
2018-03-14 11:04:56,981 INFO  [org.apache.activemq.artemis.core.server] AMQ221060: Sending
quorum vote request to localhost/127.0.0.1:61618: RequestBackupVote [backupsSize=-1, nodeID=null,
backupAvailable=false]
2018-03-14 11:04:56,983 INFO  [org.apache.activemq.artemis.core.server] AMQ221061: Received
quorum vote response from localhost/127.0.0.1:61618: RequestBackupVote [backupsSize=1, nodeID=82925fbd-275e-11e8-bff4-0a0027000011,
backupAvailable=false] ...

Server1 logs in an infinite loop:

...
2018-03-14 11:04:51,982 INFO  [org.apache.activemq.artemis.core.server] AMQ221062: Received
quorum vote request: RequestBackupVote [backupsSize=-1, nodeID=null, backupAvailable=false]
2018-03-14 11:04:51,983 INFO  [org.apache.activemq.artemis.core.server] AMQ221063: Sending
quorum vote response: RequestBackupVote [backupsSize=1, nodeID=82925fbd-275e-11e8-bff4-0a0027000011,
backupAvailable=false] ...

Why is there an endless unsuccessful backup voting taking place with backupsize -1 and null
nodeid?

Best regards,
- Ilkka

-----Original Message-----
From: Ilkka Virolainen [mailto:Ilkka.Virolainen@bitwise.fi]
Sent: 13. maaliskuuta 2018 16:46
To: users@activemq.apache.org
Subject: RE: Artemis 2.5.0 - Problems with colocated scaledown

A part of my problem was on the client side but the scaledown issue is still unresolved. It
would seem that client connectivity issues are related to the scaledown issues: to replicate
the client connectivity problem: start both brokers, then connect with 1.5.4 client using
tcp://localhost:61616 and send a message to a topic. Now shutdown server0. It scales down
to server1. Trying to send a message from the client now fails even though a failover should've
occurred. Restarting server0 results in the infinite vote for backup quorum.

Could I get clarification on whether the fault is with the broker configurations (ref. [1])
or is this an issue with Artemis? I'm aiming for a symmetrical statically defined cluster
of two nodes, each storing a backup of each other's data and when one is shut down, the data
should be made available for the remaining live broker and clients should failover to it.
When the other broker is brought back online, the replication should continue normally. 

Documentation and examples give the impression that in-vm connectors/acceptors are needed
for the scaledown and synchronization between a slave storing the backup and the colocated
live master that the backup would be scaled down to. In any case, so far I've been unable
to resolve these issues I've been having by trying out different HA options.

Best regards,
- Ilkka

[1] Reference broker configuration https://github.com/ilkkavi/activemq-artemis/tree/scaledown-issue/issues/IssueExample/src/main/resources/activemq

-----Original Message-----
From: Ilkka Virolainen [mailto:Ilkka.Virolainen@bitwise.fi]
Sent: 9. maaliskuuta 2018 14:21
To: users@activemq.apache.org
Subject: Artemis 2.5.0 - Problems with colocated scaledown

Hello,

I have some issues with scaledown of colocated servers. I have a symmetric statically defined
cluster of two colocated nodes configured with scale down. The situation occurs thus:

1. Start both brokers. They form a connection and replicate.

2. Close server1
-> Server shuts down, server0 detects the shutdown and scales down from replicated backup.

3. Start server1
-->
Server0 logs:
2018-03-09 10:57:57,434 WARN  [org.apache.activemq.artemis.core.server] AMQ222138: Local Member
is not set at on ClusterConnection ClusterConnectionImpl@914942811[nodeUUID=1ed6bd4b-2377-11e8-a9e2-0a0027000011,
connector=TransportConfiguration(name=netty-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory)
?port=61616&host=localhost&activemq-passwordcodec=****, address=, server=ActiveMQServerImpl::serverUUID=1ed6bd4b-2377-11e8-a9e2-0a0027000011]

Server1 logs in an infinite loop:

2018-03-09 11:00:57,162 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating
quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:02,156 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating
quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:07,154 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating
quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:12,153 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating
quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:17,152 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating
quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:22,153 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating
quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:27,152 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating
quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:32,149 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating
quorum vote: RequestBackupQuorumVote ...

The situation only normalizes when server1 is shut down and restarted.

Broker configurations for replicating: https://github.com/ilkkavi/activemq-artemis/tree/scaledown-issue/issues/IssueExample/src/main/resources/activemq

I also have a separate issue that I've so far been unable to replicate locally. When the brokers
deployed on two different physical servers, after one node shuts down, the other stops accepting
connections. Clients attempting connections log : org.apache.activemq.artemis.api.core.ActiveMQConnectionTimedOutException:
AMQ119013: Timed out waiting to receive cluster topology. Group:null

I don't really understand why this is happening or why it doesn't happen locally. The cluster
topology should be known already for everyone involved. I understand that it's difficult to
comment on this as there's no means of replicating this but maybe it's a situation someone
has come across before?

Best regards,
- Ilkka


Mime
View raw message