activemq-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Niski <joe.ni...@nwea.org>
Subject Durable subscriptions not surviving network disconnect
Date Thu, 09 Jun 2011 16:42:14 GMT
i'm encountering a problem in our production environment that i can 
reproduce in an integration-testing setup. Durable topic subscriptions 
do not fully reconnect after an interruption in network connectivity, 
even though the ActiveMQ brokers re-establish their connection and 
messages flow across queues.

The high-level architecture is a store & forward setup similar to the 
backoffice + retail store example in "ActiveMQ in Action":

- the Central application (on Geronimo 2.1.7) and Central ActiveMQ 
(5.4.2) broker run on the same machine.

- multiple remote machines host a similar pairing of Remote ActiveMQ and 
Remote application.

- The apps are connecting to the standalone AMQ brokers via activemq-ra, 
ignoring the AMQ instance embedded in Geronimo.

- the Central app publishes to topics on the Central broker. The topics 
are dynamically included in the Remote brokers' networkConnector 
configuration, which looks like this:

<networkConnectors>
<networkConnector name="${ApplianceID}"
                               userName="${networkConnectorUserName}"
                               password="${networkConnectorPassword}"
                               
uri="static://(ssl://${Central.ServerHostname}:${central_sslPortNumber})?initialReconnectDelay=5000&amp;maxReconnectDelay=10000&amp;useExponentialBackOff=false"
                               duplex="true"
                               dynamicOnly="true">
<dynamicallyIncludedDestinations>
<queue physicalName="org.nwea.queues.central.>"/>
<topic physicalName="org.nwea.topics.>"/>
</dynamicallyIncludedDestinations>
</networkConnector>
</networkConnectors>

- MDBs in the Remote application use durable subscriptions to connect to 
the topics on the Remote broker. We see the durable subs show up on the 
Central broker (via the web console).

Whenever there's a temporary loss of network connectivity (this happens 
form time to time with the provide hosting our Remotes), the Remote 
brokers can re-connect to the Central broker, but the durable 
subscriptions from Remote do not re-connect. They show up in the Remote 
broker's web console, but not in Central's. Messages on the Central 
broker's topics are not forwarded to the Remote broker's topics.

i've duplicated this behavior in our VMWare environment, the only place 
i can enable debug-level logging:

- i start a batch-publishing job on Central, watch the messages picked 
up and processed by Remote, then disable the network interface on Remote 
(i've done this for up to a minute so far). Central keeps publishing, 
and Remote finishes processing messages that were forwarded to its topics.

- i re-enable Remote's network interface, and see in the ActiveMQ logs 
that Remote authenticates to Central and that the DemandForwardingBridge 
is re-established. i see messages flowing on Advisory topics. i can send 
a message (via the Remote's AMQ console) to a dynamically included 
queue, and it's forwarded to Central. In Remote's AMQ console, i see the 
durable subscriptions form the Remote application's MDBs - but in 
Central's AMQ console, the durable subs appear as "offline".

The only way we've discovered to bring the durable subscriptions back 
on-line all the way to Central is to restart the Remote Geronimo 
instance. Once restarted, Remote picks up where it left off, and all the 
topic messages are retrieved and processed.

In the debug logs, we've noticed that when Remote AMQ re-connects  after 
the outage, queue and topic connections seem to use different ports than 
before the outage, and wonder if this is part of the failure of durable 
subscriptions to reconnect.

i've already tried a few minor variations in the networkConnector 
configuration, the most recent being "useExponentialBackOff=false". In 
addition, i've enabled TCP keepalive in the transportConnectors:

<transportConnectors>
<transportConnector name="openwire" 
uri="tcp://0.0.0.0:${remote_openwirePortNumber}?keepAlive=true"/>
<transportConnector name="ssl" 
uri="ssl://0.0.0.0:${remote_sslPortNumber}?keepAlive=true"/>
</transportConnectors>

We've already looked at various operating-system issues with the network 
stacks on our servers, and nothing seems to be amiss - no 
resource-starvation of any kind. And the point really is that we need 
the durable subs to survive a brief disconnect. AMQ itself seems to 
reconnect just fine. At the moment, getting rid of activemq-ra and the 
Geronimo resource adapters and moving to Spring's JMS support (as one 
consultant suggested) isn't an option for our production issues, 
regardless of how attractive it is in the bigger scheme of things.

This is a real problem for us and our customers. Any guidance is 
appreciated.
-- 

*Joe Niski*
Senior Developer - Information Services  |  NWEA™

PHONE 503.548.5207 | FAX 503.639.7873

NWEA.ORG <http://www.nwea.org/> | Partnering to help all kids learn™


Mime
View raw message