activemq-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Catalin Alexandru Zamfir (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (ARTEMIS-1285) Standby slave would not announce replication to master when the slave is down
Date Tue, 17 Apr 2018 20:22:00 GMT

    [ https://issues.apache.org/jira/browse/ARTEMIS-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441434#comment-16441434
] 

Catalin Alexandru Zamfir edited comment on ARTEMIS-1285 at 4/17/18 8:21 PM:
----------------------------------------------------------------------------

The example is simple. Our set-up involves jgroups TCPPING discovery with "initial_hosts"
set to the live nodes only and in order to avoid Jgroups 'no physical address for node: UUID"
we have set "send_cache_on_join" and "return_entire_cache" to true on the TCPPING setup in
jgroups.

Below is the master/slave configurations (identical for the backups).
{code:java}
... Jgroups TCPPING discovery/broadcast configuration above ...

Working cluster, tested with ./artemis producer/consumer CLI commands from different nodes
on different physical machines.

... on master (live)
<ha-policy>
<replication>
<master>
<check-for-live-server>true</check-for-live-server>
<group-name>g1</group-name>
<initial-replication-sync-timeout>15000</initial-replication-sync-timeout>
<cluster-name>shared-artemis-cluster</cluster-name>
<vote-on-replication-failure>true</vote-on-replication-failure>
</master>
</replication>
</ha-policy>

... on replicas (2x)
<ha-policy>
<replication>
<slave>
<allow-failback>true</allow-failback>
<group-name>g1</group-name>
<initial-replication-sync-timeout>15000</initial-replication-sync-timeout>
<cluster-name>shared-artemis-cluster</cluster-name>
<vote-retries>12</vote-retries>
<vote-retry-wait>5000</vote-retry-wait>
</slave>
</replication>
</ha-policy>{code}
Note that I repeated the fresh install several times (using Ansible, docker and fresh LVs
on LVM, everything is purged and reinstalled). Every fresh install, "r3" enters the loop.
But any manual intervention (eg.restart of r3) makes it behave normally (staying in standby
state until some manual stop of the master is done, at which point it becomes backup for r2).

Looks like some sort of cluster "intial state" conflict. Maybe related to TCPPING + jgroups
in this setup. Sadly I can't use multicast (UDP) in our network to provide a different behaviour
for comparison. I'm on all 3 hawtio management consoles when installing the fresh cluster.
One reports live, the other bacup, the 3rd gives exceptions 'Broker is stopped' when trying
to view any attributes.

It's late for me, taking this for a spin tomorrow. If I can provide any more information,
please ask. Thanks!


was (Author: antauri):
The example is simple. Our set-up involves jgroups TCPPING discovery with "initial_hosts"
set to the live nodes only and in order to avoid Jgroups 'no physical address for node: UUID"
we have set "send_cache_on_join" and "return_entire_cache" to true on the TCPPING setup in
jgroups.

Below is the master/slave configurations (identical for the backups).
{code:java}
... Jgroups TCPPING discovery/broadcast configuration above ...

Working cluster, tested with ./artemis producer/consumer CLI commands from different nodes
on different physical machines.

... on master (live)
<ha-policy>
<replication>
<master>
<check-for-live-server>true</check-for-live-server>
<group-name>g1</group-name>
<initial-replication-sync-timeout>15000</initial-replication-sync-timeout>
<cluster-name>shared-artemis-cluster</cluster-name>
<vote-on-replication-failure>true</vote-on-replication-failure>
</master>
</replication>
</ha-policy>

... on replicas (2x)
<ha-policy>
<replication>
<slave>
<allow-failback>true</allow-failback>
<group-name>g1</group-name>
<initial-replication-sync-timeout>15000</initial-replication-sync-timeout>
<cluster-name>shared-artemis-cluster</cluster-name>
<vote-retries>12</vote-retries>
<vote-retry-wait>5000</vote-retry-wait>
</slave>
</replication>
</ha-policy>{code}

> Standby slave would not announce replication to master when the slave is down
> -----------------------------------------------------------------------------
>
>                 Key: ARTEMIS-1285
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-1285
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 2.1.0
>            Reporter: yangwei
>            Priority: Major
>
> We have a cluster of 3 instances: A is master, B is slave and C is standby slave. When
slave is down, we expect C announces replication to A but A is in standalone mode all the
time. We see C waits at "nodeLocator.locateNode()" through jstack command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message