activemq-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Wood <christopher_w...@pobox.com>
Subject Re: prefetch limit and clustering
Date Wed, 27 Apr 2016 17:26:29 GMT
(inline)

On Tue, Apr 26, 2016 at 08:29:23PM -0600, Tim Bain wrote:
> Cross-datacenter connectivity is definitely possible.
> 
> You can configure ActiveMQ to discard messages once the dispatch queue is
> full, and you can specify strategies for deciding which messages to
> discard, but it appears that you're already using them so maybe you're
> looking for something different?

I'd like for the cluster to resume being clustered without manual intervention after network
connectivity is restored. Everything else works well.

Also, update on the config, I tried out

<constantPendingMessageLimitStrategy limit="50"/>

on all brokers and that appears to have dealt with the error message at least. Still having
the apparent clustering failure.

> Reconnecting automatically is the default behavior for networkConnectors
> using the static transport, but first your broker has to recognize that the
> remote broker is no longer available, and that doesn't happen
> instantaneously for network failures.  Till it does, the remote broker will
> be considered slow (if there are enough messages pending), and you'll see
> log lines like those, after which I'd expect you'll see lines about trying
> and failing to reconnect, until connectivity is restored and the logs go
> back to normal.  Is that the behavior you're seeing?

We had another cluster breakdown last night and I am seeing a successful reconnection attempt
(pardon privacy munging).

North America, centre of the star (first log lines in several hours of the log):

INFO   | jvm 1    | 2016/04/26 19:41:40 |  WARN | Network connection between vm://mcomq3.me.com#20
and ssl:///1.2.3.4:52022 shutdown due to a remote error: org.apache.activemq.transport.InactivityIOException:
Channel was inactive for too (>30000) long: tcp://1.2.3.4:52022
INFO   | jvm 1    | 2016/04/26 19:41:40 |  INFO | mcomq3.me.com bridge to mcomq4.me.eu stopped
INFO   | jvm 1    | 2016/04/26 19:41:41 |  INFO | Started responder end of duplex bridge mcomq4.me.eu-mcomq3.me.com-topics@ID:mcomq4-53890-1461690928760-0:1
INFO   | jvm 1    | 2016/04/26 19:41:41 |  INFO | Network connection between vm://mcomq3.me.com#40
and ssl:///1.2.3.4:52044 (mcomq4.me.eu) has been established.

Europe (also first log message in several hours):

INFO   | jvm 1    | 2016/04/26 23:41:40 |  WARN | Network connection between vm://mcomq4.me.eu#0
and ssl://mcomq3.me/5.6.7.8:61617 shutdown due to a remote error: java.io.EOFException
INFO   | jvm 1    | 2016/04/26 23:41:40 |  INFO | Establishing network connection from vm://mcomq4.me.eu?async=false
to ssl://mcomq3.me:61617
INFO   | jvm 1    | 2016/04/26 23:41:40 |  INFO | mcomq4.me.eu bridge to mcomq3.me stopped
INFO   | jvm 1    | 2016/04/26 23:41:41 |  INFO | Network connection between vm://mcomq4.me.eu#8
and ssl://mcomq3.me/5.6.7.8:61617 (mcomq3.me) has been established.

However it turned out the clustering was flat on its face again.

Having read what you said about restarts I restarted the 3 brokers being the leaves (terminology?)
of the star. I saw them connect in the log but there weren't messages being passed through
by my mco client tests.

After that I restarted only the central broker and the other brokers reconnected and I had
the cluster back. (Previously I would bring them all down and start them center-first which
would bring the cluster back.)

So the actual reconnection behaviour is definitely working from the client side. I'm pondering
if it's something about the central broker in the star that is having trouble after these
disconnections. Any hints are very much welcome, this is almost certainly self-inflicted.

> And yes, in the case of a network partition, a broker in a given partition
> should be able to service the clients in that partition, though not
> necessarily the clients in another partition.
> 
> Tim
> On Apr 25, 2016 11:22 AM, "Christopher Wood" <christopher_wood@pobox.com>
> wrote:
> 
> > As background, there is an activemq cluster (5.13.2 on CentOS 6.7, star
> > topology) here to support mcollective. One datacenter is on the other side
> > of the Atlantic and every time inter-datacenter connectivity is interrupted
> > we see this prefetch log fragment and the clustering to that activemq
> > instance stops working.
> >
> > INFO   | jvm 1    | 2016/04/22 21:47:45 |  WARN | TopicSubscription:
> > consumer=mcomq4.me.eu->mcomq3.me.com-43531-1461361657724-32:1:1:1,
> > destinations=75, dispatched=1000, delivered=0, matched=1001, discarded=0:
> > has twice its prefetch limit pending, without an ack; it appears to be slow
> >
> > How would I get the activemq initiating the connection to stop clogging
> > like this and just try to reconnect periodically?
> >
> > Or is it even reasonable to cluster activemq between datacenters?
> >
> > More:
> >
> > I haven't found any activemq.xml setting which reads like "automatically
> > try to reconnect" or "just throw away older messages". There are
> > activemq.xml bits below.
> >
> > The clustering works well until that log line. The actual instance in
> > Europe and daemons connecting to it work just fine after the log line above
> > as long as I keep my requests local to that datacenter.
> >
> >
> > Bits from activemq.xml:
> >
> > <destinationPolicy>
> >   <policyMap>
> >     <policyEntries>
> >       <policyEntry topic=">" producerFlowControl="false"
> > usePrefetchExtension="false">
> >         <messageEvictionStrategy>
> >           <oldestMessageEvictionStrategy/>
> >         </messageEvictionStrategy>
> >         <pendingMessageLimitStrategy>
> >           <prefetchRatePendingMessageLimitStrategy multiplier="2"/>
> >         </pendingMessageLimitStrategy>
> >       </policyEntry>
> >       <policyEntry queue="*.reply.>" gcInactiveDestinations="true"
> > inactiveTimoutBeforeGC="300000" />
> >     </policyEntries>
> >   </policyMap>
> > </destinationPolicy>
> >
> > <networkConnectors>
> >   <networkConnector
> >       name="mcomq4.me.eu-mcomq3.me.com-topics"
> >       uri="static:(ssl://mcomq3.me.com:61617)"
> >       userName="amq"
> >       password="password"
> >       duplex="true"
> >       decreaseNetworkConsumerPriority="true"
> >       networkTTL="3"
> >       dynamicOnly="true">
> >     <excludedDestinations>
> >       <queue physicalName=">" />
> >     </excludedDestinations>
> >   </networkConnector>
> >   <networkConnector
> >       name="mcomq4.me.eu-mcomq3.me.com-queues"
> >       uri="static:(ssl://mcomq3.me.com:61617)"
> >       userName="amq"
> >       password="password"
> >       duplex="true"
> >       decreaseNetworkConsumerPriority="true"
> >       networkTTL="3"
> >       dynamicOnly="true"
> >       conduitSubscriptions="false">
> >     <excludedDestinations>
> >       <topic physicalName=">" />
> >     </excludedDestinations>
> >   </networkConnector>
> > </networkConnectors>
> >
> > <transportConnectors>
> >     <transportConnector name="stomp+nio+ssl" uri="stomp+ssl://
> > 0.0.0.0:61614?needClientAuth=true&amp;transport.enabledProtocols=TLSv1,TLSv1.1,TLSv1.2&amp;transport.hbGracePeriodMultiplier=5
> > "/>
> >     <transportConnector name="openwire+nio+ssl" uri="ssl://
> > 0.0.0.0:61617?needClientAuth=true&amp;transport.enabledProtocols=TLSv1,TLSv1.1,TLSv1.2
> > "/>
> > </transportConnectors>
> >
> >
> > Other things I've read to try and understand this:
> >
> >
> > https://access.redhat.com/documentation/en-US/Red_Hat_JBoss_A-MQ/6.0/html-single/Using_Networks_of_Brokers/index.html
> >
> >
> > https://access.redhat.com/documentation/en-US/Red_Hat_JBoss_A-MQ/6.0/html-single/Tuning_Guide/
> >
> > http://activemq.apache.org/slow-consumer-handling.html
> >
> >
> > My previous threads elsewhere, when I did not understand that it was a
> > specific network event causing clustering to break:
> >
> > https://groups.google.com/forum/#!topic/mcollective-users/MkHSVHt9uEI
> >
> > https://groups.google.com/forum/#!topic/mcollective-users/R2mEnuV5eK8
> >

Mime
View raw message