activemq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stirling Chow (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AMQ-4159) Lack of thread-safety in SimpleDiscoveryAgent can cause multiple concurrent attempts to establish bridge, which can result in permanent bridge failure
Date Sat, 03 Nov 2012 22:41:12 GMT

     [ https://issues.apache.org/jira/browse/AMQ-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Stirling Chow updated AMQ-4159:
-------------------------------

    Patch Info: Patch Available
    
> Lack of thread-safety in SimpleDiscoveryAgent can cause multiple concurrent attempts
to establish bridge, which can result in permanent bridge failure
> ------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: AMQ-4159
>                 URL: https://issues.apache.org/jira/browse/AMQ-4159
>             Project: ActiveMQ
>          Issue Type: Bug
>    Affects Versions: 5.8.0
>            Reporter: Stirling Chow
>
> Symptom
> =======
> I was diagnosing a deadlock issue in {{DiscoveryNetworkConnector}} and noticed that during
one of the tests, concurrent calls were being made to {{DiscoveryNetworkConnector.onServiceAdd}}
for the same {{DiscoveryEvent}}.  This was unexpected because only a single service (URL)
had been given to {{SimpleDiscoveryAgent}}.  In fact, during one of the tests I observed dozens
of concurrent calls.
> Concurrent attempts to establish a bridge to the *same* remote broker are problematic
because they expose a number of race conditions in {{DiscoveryNetworkConnector}} and {{RegionBroker}}
that can lead to permanent bridge failure, as well as unnecessary thread pool execution/resource
usage and logging.
> The race conditions will be filed as separate issues.  This issue specifically addresses
the bug that causes {{SimpleDiscoveryAgent}} to uncontrollably multiply bridge connection
attempts.
> Cause
> =====
> When {{DemandForwardingBridgeSupport}} handles exceptions from either the local or remote
sides of the the bridge, it fires a "bridge failed" event:
> {code:title=DemandForwardingBridgeSupport.java}
> public void serviceLocalException(Throwable error) {
>     if (!disposed.get()) {
>         LOG.info("Network connection between " + localBroker + " and " + remoteBroker
+ " shutdown due to a local error: " + error);
>         LOG.debug("The local Exception was:" + error, error);
>         brokerService.getTaskRunnerFactory().execute(new Runnable() {
>             public void run() {
>                 ServiceSupport.dispose(getControllingService());
>             }
>         });
>         fireBridgeFailed();
>     }
> }
> public void serviceRemoteException(Throwable error) {
>     if (!disposed.get()) {
>         if (error instanceof SecurityException || error instanceof GeneralSecurityException)
{
>             LOG.error("Network connection between " + localBroker + " and " + remoteBroker
+ " shutdown due to a remote error: " + error);
>         } else {
>             LOG.warn("Network connection between " + localBroker + " and " + remoteBroker
+ " shutdown due to a remote error: " + error);
>         }
>         LOG.debug("The remote Exception was: " + error, error);
>         brokerService.getTaskRunnerFactory().execute(new Runnable() {
>             public void run() {
>                 ServiceSupport.dispose(getControllingService());
>             }
>         });
>         fireBridgeFailed();
>     }
> }
> private void fireBridgeFailed() {
>     NetworkBridgeListener l = this.networkBridgeListener;
>     if (l != null) {
>         l.bridgeFailed();
>     }
> }
> {code}
> {{DiscoveryNetworkConnector}} is the {{NetworkBridgeListener}}, and it's {{bridgeFailed}}
method calls back to {{SimpleDiscoveryAgent.serviceFailed(...)}}:
> {code:title=DiscoveryNetworkConnectol.java}
> protected NetworkBridge createBridge(Transport localTransport, Transport remoteTransport,
final DiscoveryEvent event) {
>     class DiscoverNetworkBridgeListener extends MBeanNetworkListener {
>         public DiscoverNetworkBridgeListener(BrokerService brokerService, ObjectName
connectorName) {
>             super(brokerService, connectorName);
>         }
>         public void bridgeFailed() {
>             if (!serviceSupport.isStopped()) {
>                 try {
>                     discoveryAgent.serviceFailed(event);
>                 } catch (IOException e) {
>                 }
>             }
>         }
>     }
> ...
> {code}
> In response, {{SimpleDiscoveryAgent.serviceFailed(...)}} pauses for the {{reconnectDelay}}
before attempting to re-establish the bridge via {{DiscoveryNetworkConnector.onServiceAdd(...)}}:
> {code:title=SimpleDiscoveryAgent.java}
> public void serviceFailed(DiscoveryEvent devent) throws IOException {
>     final SimpleDiscoveryEvent event = (SimpleDiscoveryEvent)devent;
>     if (sevent.failed.compareAndSet(false, true)) {
>         listener.onServiceRemove(sevent);
>         taskRunner.execute(new Runnable() {
>             public void run() {
>                 // We detect a failed connection attempt because the service
>                 // fails right
>                 // away.
>                 if (event.connectTime + minConnectTime > System.currentTimeMillis())
{
>                     LOG.debug("Failure occurred soon after the discovery event was generated.
 It will be classified as a connection failure: "+event);
> ...
>                     synchronized (sleepMutex) {
>                         try {
>                             if (!running.get()) {
>                                 LOG.debug("Reconnecting disabled: stopped");
>                                 return;
>                             }
>                             LOG.debug("Waiting "+event.reconnectDelay+" ms before attempting
to reconnect.");
>                             sleepMutex.wait(event.reconnectDelay);
>                         } catch (InterruptedException ie) {
>                             LOG.debug("Reconnecting disabled: " + ie);
>                             Thread.currentThread().interrupt();
>                             return;
>                         }
>                     }
> ...
>                 event.connectTime = System.currentTimeMillis();
>                 event.failed.set(false);
>                 listener.onServiceAdd(event);
>             }
>         }, "Simple Discovery Agent");
>     }
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message