activemq-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Bertram <jbert...@apache.com>
Subject Re: Problems setting up replicated ha-policy.
Date Tue, 31 Jan 2017 14:07:21 GMT
Replication, as currently implemented, only supports replicating to a single backup at a time
(as noted previously).  

One can configure multiple backups for a single live, but only one of those slaves will receive
the replicated data.  If the live dies then the replica backup will become live and begin
replicating to one of the additional backups.

Replicating to lots of backups (if possible) might not be desirable from a performance perspective
as each replica requires a network hop and a disk sync.


Justin

----- Original Message -----
From: "Tim Bain" <tbain@alumni.duke.edu>
To: "ActiveMQ Users" <users@activemq.apache.org>
Sent: Tuesday, January 31, 2017 7:52:12 AM
Subject: Re: Problems setting up replicated ha-policy.

Justin,

Is there a use case where replicating to only some of the slaves in a
cluster with N >= 3 is desirable, or does this just mean that replication
as currently implemented is recommended only for N = 2?

Tim

On Jan 31, 2017 6:45 AM, "Justin Bertram" <jbertram@apache.com> wrote:

Your assumption is correct, and there is no way currently to replicate to
multiple slaves concurrently.  Replication occurs only to a single slave at
a time.  I believe there is a JIRA for implementing this feature (or
something similar), but I can't locate it at the moment.


Justin

----- Original Message -----
From: "Gerrit Tamboer" <Gerrit.Tamboer@crv4all.com>
To: users@activemq.apache.org
Sent: Tuesday, January 31, 2017 2:53:03 AM
Subject: Re: Problems setting up replicated ha-policy.

Hi Clebert, Justin,

Thanks a bunch for the the good feedback, helped me a lot.

One final question.
In a 3 node cluster setup with HA-policy, there is one active master and 2
slaves. One of these slaves is the active replicating slave and the other
is a hot-standby (basically). So if the active slave fails, the standby
slave will pick up replication. That however means that replication only
starts when a slave is the active slave, and the other slave does not
replicate anything and waits untill the other slave goes down. Correct? So
if my master and active slave burn down, I have data loss because the
passive slave was not actively replicating?

If my assumption is correct, is there a way to have both slaves actively
replicating the data?

Regards,
Gerrit

On 30/01/17 20:16, "Clebert Suconic" <clebert.suconic@gmail.com> wrote:

As Justin pointed out, look at the Network Health Check.  Or to use a
better infra-structure to avoid split brains.

On Mon, Jan 30, 2017 at 11:48 AM, Justin Bertram <jbertram@apache.com>
wrote:
>> It does what I think it does, now my slave and my master are active.
This however is acceptable, no problems yet.
>
> Actually, this is a problem.  This is the classic split-brain scenario.
Since both your master and slave are active with the same messages you will
lose data integrity.  Once the network connection between the live and (now
active) backup is restored there is nothing which can be done to
re-integrate the data since there is no way of knowing which broker has the
right data.  This is the risk you run with a single live and backup.  To
mitigate the risk of split-brain you have a couple of options:
>
>   1) Invest in redundant network infrastructure (e.g. multiple NICs on
each machine, redundant network switches, etc.).  Obviously you'll need to
perform a cost/risk analysis here to determine how much your data is
actually worth.
>   2) Configure a larger cluster of live/backup pairs so that if a
connection between nodes is lost a quorum vote can (hopefully) prevent the
illegitimate activation of a backup.
>   3) Similar to #2 you can use the recently added "network check"
functionality [1].
>
>
> Justin
>
>
> [1] http://activemq.apache.org/artemis/docs/1.5.2/network-isolation.html
>
> ----- Original Message -----
> From: "Gerrit Tamboer" <Gerrit.Tamboer@crv4all.com>
> To: users@activemq.apache.org
> Sent: Monday, January 30, 2017 10:03:42 AM
> Subject: Re: Problems setting up replicated ha-policy.
>
> Hi Clebert,
>
> Thanks for pointing me in the right direction, I was able to set up
replication with active/passive failover.
>
> I am able to stop the master or kill the master and the slave is
responding to it. If I start up the master again the slave replicates back
to master and the master becomes active. So far so good.
>
> So what I simulated now is a network outage. I did this by simply making
sure that the master cannot connect to the slave and vice versa
(VirtualBox, setting the network adapter to disabled).
> It does what I think it does, now my slave and my master are active. This
however is acceptable, no problems yet. But when I enable the network
adapter again, making sure the master and slave can connect, it does not do
a failback. The slave stays active, as well as the master, and they don’t
seem to communicate. Is this some sort of splitbrain situation?
>
> Regards,
> Gerrit
>
>
> On 27/01/17 21:25, "Clebert Suconic" <clebert.suconic@gmail.com> wrote:
>
> The only issue I found is how you are defining this:
>
> <connector name="localhost">tcp://localhost:61616</connector>
>
> on the cluster connection you are passing localhost as the node, that
> is sent to the backup, backup will try to connect to localhost which
> is itself, so it won't actually connect to the other node.
>
>
> You should pass in a valid IP that will be valid on the second node.
>
>
> Hope this helps...
>
>
> Look at the  examples/features/ha/replicated-failback-static example
>
> On Fri, Jan 27, 2017 at 9:28 AM, Clebert Suconic
> <clebert.suconic@gmail.com> wrote:
>> I won't be able to get to a computer today. Only on Monday.
>>
>>
>> Meanwhile can you compare your config with the replicated examples from
the
>> release? That's what I would do anyways.
>>
>>
>> Try with a single live/backup.  Make sure the Id match on the backup so
it
>> can pull the data.
>>
>> Let me know how it goes. I may find a time to open a computer this
>> afternoon.
>>
>> On Fri, Jan 27, 2017 at 5:32 AM Gerrit Tamboer <
Gerrit.Tamboer@crv4all.com>
>> wrote:
>>>
>>> Hi Clebert,
>>>
>>> Thanks for pointing this out.
>>>
>>> I just tested 1.5.2 but unfortunately the results are exactly the same.
No
>>> failover situation although the slave sees the master going down. The
slave
>>> does not even notice a master being gone after a kill -9.
>>>
>>> This leads me to believe I have a misconfiguration, because if this is
>>> designed to work like this, it’s not really HA .
>>>
>>> I have added the broker.xml’s of all nodes to this mail again, hopefully
>>> somebody has a simular setup and can verify the configuration.
>>>
>>> Thanks a bunch!
>>>
>>> Regards,
>>> Gerrit Tamboer
>>>
>>>
>>> On 27/01/17 04:33, "Clebert Suconic" <clebert.suconic@gmail.com> wrote:
>>>
>>> Until recently (1.5.0) you would only have the TTL to decide when to
>>> activate backup.
>>>
>>>
>>> Recently connection failures will also play in the decision to activate
>>> it.
>>>
>>>
>>> So on 1.3.0 you will be bound to the TTL of the cluster connection.
>>>
>>>
>>> On 1.5.2 ir should work with kill but you would still be bound to TTL in
>>> case of a cable cut or switch of but that's the deal of tcp-ip
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Jan 26, 2017 at 7:24 AM Gerrit Tambour
>>> <Gerrit.Tamboer@crv4all.com>
>>> wrote:
>>>
>>> > Forgot to send the attachments!
>>> >
>>> >
>>> >
>>> > *From: *Gerrit Tamboer <Gerrit.Tamboer@crv4all.com>
>>> > *Date: *Thursday 26 January 2017 at 13:23
>>> > *To: *"users@activemq.apache.org" <users@activemq.apache.org>
>>> > *Subject *Problems setting up replicated ha-policy.
>>> >
>>> >
>>> >
>>> > Hi community,
>>> >
>>> >
>>> >
>>> > We are attempting to setup a 3 node Artemis (1.3.0) cluster with an
>>> > active-passive failover situation. We see that the master node is
>>> > actively
>>> > accepting connections:
>>> >
>>> >
>>> >
>>> > 09:52:30,167 INFO  [org.apache.activemq.artemis.core.server]
AMQ221000:
>>> > live Message Broker is starting with configuration Broker
Configuration
>>> > (clustered=true
>>> >
>>> > ,journalDirectory=./data/journal,bindingsDirectory=./data/bindings,
largeMessagesDirectory=./data/large-messages,pagingDirectory=/opt/jamq_
paging_data/data)
>>> >
>>> > 09:52:33,176 INFO  [org.apache.activemq.artemis.core.server]
AMQ221020:
>>> > Started Acceptor at 0.0.0.0:61616 for protocols
>>> > [CORE,MQTT,AMQP,HORNETQ,STOMP,OPENWIRE]
>>> >
>>> >
>>> >
>>> > The slaves are able to connect to the master and are reporting that
they
>>> > are in standby mode:
>>> >
>>> >
>>> >
>>> > 08:16:57,426 INFO  [org.apache.activemq.artemis.core.server]
AMQ221000:
>>> > backup Message Broker is starting with configuration Broker
Configuration
>>> > (clustered=true,journalDirectory=./data/journal,bindingsDirectory=./
data/bindings,largeMessagesDirectory=./data/large-messages,
pagingDirectory=/opt/jamq_paging_data/data)
>>> >
>>> > 08:18:38,529 INFO  [org.apache.activemq.artemis.core.server]
AMQ221109:
>>> > Apache ActiveMQ Artemis Backup Server version 1.3.0 [null] started,
waiting
>>> > live to fail before it gets active
>>> >
>>> >
>>> >
>>> > However, when I kill the master node now, it reports that the master
is
>>> > gone , but does not become active itself:
>>> >
>>> >
>>> >
>>> > 08:20:14,987 WARN  [org.apache.activemq.artemis.core.client]
AMQ212037:
>>> > Connection failure has been detected: AMQ119015: The connection was
>>> > disconnected because of server shutdown [code=DISCONNECTED]
>>> >
>>> >
>>> >
>>> > When I do a kill -9 on the PID of the master java process, it does not
>>> > even report that the master has gone away.
>>> >
>>> > I also tested this in Artemis 1.5.1, with the same results. Also
>>> > removing
>>> > one of the slaves (to have a simple master-slave setup), also does not
>>> > work.
>>> >
>>> > My expectation is that if the master dies, one of the slaves becomes
>>> > active.
>>> >
>>> > Attached you will find the broker.xml of all 3 nodes.
>>> >
>>> >
>>> >
>>> > Thanks in advance for the help!
>>> >
>>> >
>>> >
>>> > Kind regards,
>>> >
>>> > Gerrit Tamboer
>>> >
>>> >
>>> >
>>> >
>>> > This message is subject to the following E-mail Disclaimer. (
>>> > http://www.crv4all.com/disclaimer-email/) CRV Holding B.V. seats
>>> > according to the articles of association in Arnhem, Dutch trade number
>>> > 09125050.
>>> >
>>> --
>>> Clebert Suconic
>>>
>>>
>>> This message is subject to the following E-mail Disclaimer.
>>> (http://www.crv4all.com/disclaimer-email/) CRV Holding B.V. seats
according
>>> to the articles of association in Arnhem, Dutch trade number 09125050.
>>
>> --
>> Clebert Suconic
>
>
>
> --
> Clebert Suconic
>
>
> This message is subject to the following E-mail Disclaimer. (
http://www.crv4all.com/disclaimer-email/) CRV Holding B.V. seats according
to the articles of association in Arnhem, Dutch trade number 09125050.



--
Clebert Suconic


This message is subject to the following E-mail Disclaimer. (
http://www.crv4all.com/disclaimer-email/) CRV Holding B.V. seats according
to the articles of association in Arnhem, Dutch trade number 09125050.

Mime
View raw message