activemq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gary Tully" <gary.tu...@gmail.com>
Subject Re: Slave broker out of sync with master
Date Thu, 17 Jul 2008 15:19:21 GMT
Hi ying,
I think what you are trying to achieve is not really possible with the
existing master slave architecture.
The master/slave pair are kept in sink by the master through real time
replication of broker commands to the slave. If the master dies,
clients can fail over to the slave. A copy of data and restart of both
is needed to recreate the master/slave pair.
If the slave dies, both the master and slave again need to be
restarted because the master does not remember and replay commands
that the slave may have missed. In other words, a slave cannot
reliably deal with a master broker restart. Using a failover transport
would suggest that it can, but my understanding is that it was never
intended to.

gary.

2008/7/17 yinghe0101 <yinghe0101@yahoo.com>:
>
> hi, gary,
>
> I will see what I can do with the test but my scenario is a little
> different. on the slave, I used failover as masterConnectorURI( eg.
> masterConnectorURI="failover://(tcp://master:61616)) because we want to make
> sure when we start master and slave, slave is attached. Using tcp, it might
> be a chance slave starts as a master because when it cannot connect to the
> master, it will start, also shutdownOnMasterFailure for tcp does not work(
> https://issues.apache.org/activemq/browse/AMQ-1813). Even it works, we don't
> want slave to try only once and stop.
>
> The issue of using failover for masterConnectorURI is when it reconnects, it
> does not send the BrokerInfo, so the master will not know it is a slave.
> Attached you can find my fix for this.
>
> The Slave broker out of sync exception only happens when I use failover for
> masterConnectorURI and kill the master, restart the master and after the
> reconnect is established, then I use a producer to send some messages, there
> are other transacted sessions connecting to the master at the time to
> consume-process-publish-then-commit the message so it is a little
> complicated but our application requires that.
>
> A note is if I do either of the following, this exception will not occur:
> 1. delete both master/slave's data dir ( just for test, it cannot happen in
> a production environment)
> 2. before starting master, copy master's data dir to slave without killing
> slave for its reconnect to master trial.
>
> Another note is on the client side, only the master's uri is among the
> failover list so the slave is only for replication purpose.
>
> Since my setup is a little complicated, it might be hard to code-test it but
> I will see what I can do. I hope this explanation is clear and any
> suggestion is appreciated.
>
> ying
>
> here is the patch:
> --- src/main/java/org/apache/activemq/broker/ft/MasterConnector.java
> (revision 672308)
> +++ src/main/java/org/apache/activemq/broker/ft/MasterConnector.java
> (working copy)
> @@ -70,6 +70,7 @@
>     private SessionInfo sessionInfo;
>     private ProducerInfo producerInfo;
>     private final AtomicBoolean masterActive = new AtomicBoolean();
> +    private BrokerInfo brokerInfo;
>
>     public MasterConnector() {
>     }
> @@ -99,6 +100,7 @@
>         if (!started.compareAndSet(false, true)) {
>             return;
>         }
> +
>         if (remoteURI == null) {
>             throw new IllegalArgumentException("You must specify a
> remoteURI");
>         }
> @@ -120,6 +122,7 @@
>
>             public void onCommand(Object o) {
>                 Command command = (Command)o;
> +                LOG.debug("## remoteBroker command:"+command);
>                 if (started.get()) {
>                     serviceRemoteCommand(command);
>                 }
> @@ -130,7 +133,17 @@
>                     serviceRemoteException(error);
>                 }
>             }
> +
> +            public void transportResumed() {
> +               try{
> +                       remoteBroker.oneway(brokerInfo);
> +               }catch(IOException e){
> +                       LOG.error("MasterConnector failed to send BrokerInfo in
> transportResumed:", e);
> +               }
> +               LOG.info("MasterConnector sent BrokerInfo when transport
> resumed.");
> +            }
>         });
> +
>         try {
>             localBroker.start();
>             remoteBroker.start();
> @@ -139,7 +152,7 @@
>         } catch (Exception e) {
>             masterActive.set(false);
>             LOG.error("Failed to start network bridge: " + e, e);
> -        }
> +        }
>     }
>
>     protected void startBridge() throws Exception {
> @@ -148,10 +161,8 @@
>         connectionInfo.setClientId(idGenerator.generateId());
>         connectionInfo.setUserName(userName);
>         connectionInfo.setPassword(password);
> +        connectionInfo.setBrokerMasterConnector(true);
>         localBroker.oneway(connectionInfo);
> -        ConnectionInfo remoteInfo = new ConnectionInfo();
> -        connectionInfo.copy(remoteInfo);
> -        remoteInfo.setBrokerMasterConnector(true);
>         remoteBroker.oneway(connectionInfo);
>         sessionInfo = new SessionInfo(connectionInfo, 1);
>         localBroker.oneway(sessionInfo);
> @@ -159,7 +170,6 @@
>         producerInfo = new ProducerInfo(sessionInfo, 1);
>         producerInfo.setResponseRequired(false);
>         remoteBroker.oneway(producerInfo);
> -        BrokerInfo brokerInfo = null;
>         if (connector != null) {
>             brokerInfo = connector.getBrokerInfo();
>         } else {
>
>
>
>
> Gary Tully wrote:
>>
>> ying,
>> do you think it would be possible to build a test case that reproduced
>> the problem. Possibly based on QueueMasterSlaveTest[1] or based on
>> something similar?
>>
>> [1]
>> http://svn.apache.org/viewvc/activemq/trunk/activemq-core/src/test/java/org/apache/activemq/broker/ft/QueueMasterSlaveTest.java?view=markup
>>
>> 2008/7/16 yinghe0101 <yinghe0101@yahoo.com>:
>>>
>>> hi,
>>> With the latest trunk, i still get the following:
>>> javax.jms.JMSException: Slave broker out of sync with master: Dispatched
>>> message (ID:yhe-3822-1216229856070-0:0:1:1:1) was not in the pending list
>>>
>>> thus the messageAck will fail because it is not in the dispatch list
>>>
>>> From some investigation, i found that the MessageDispatchNotification
>>> happens before the message is adding to the pending in
>>> PrefetchSubscription.
>>> The following order needs to be enforced ( slave adding message to
>>> pending-->slave get MessageDispatchNotification -->slave get MessageAck).
>>> somehow there is a race condition which breaks the sync between the slave
>>> and master
>>>
>>> I was trying to look into how the pending messages gets added on the
>>> slave,
>>> any explanation or suggestion is appreciated. Thank you.
>>>
>>> ying
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Slave-broker-out-of-sync-with-master-tp18492930p18492930.html
>>> Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.
>>>
>>>
>>
>>
> http://www.nabble.com/file/p18508195/MasterConnectorPatch.txt
> MasterConnectorPatch.txt
> --
> View this message in context: http://www.nabble.com/Slave-broker-out-of-sync-with-master-tp18492930p18508195.html
> Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.
>
>

Mime
View raw message