Return-Path: X-Original-To: apmail-activemq-dev-archive@www.apache.org Delivered-To: apmail-activemq-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AD1DDDA76 for ; Sat, 3 Nov 2012 22:41:13 +0000 (UTC) Received: (qmail 97036 invoked by uid 500); 3 Nov 2012 22:41:13 -0000 Delivered-To: apmail-activemq-dev-archive@activemq.apache.org Received: (qmail 96986 invoked by uid 500); 3 Nov 2012 22:41:13 -0000 Mailing-List: contact dev-help@activemq.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@activemq.apache.org Delivered-To: mailing list dev@activemq.apache.org Received: (qmail 96908 invoked by uid 99); 3 Nov 2012 22:41:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 03 Nov 2012 22:41:13 +0000 Date: Sat, 3 Nov 2012 22:41:12 +0000 (UTC) From: "Stirling Chow (JIRA)" To: dev@activemq.apache.org Message-ID: <2098846985.65259.1351982473010.JavaMail.jiratomcat@arcas> In-Reply-To: <696109241.65258.1351982472883.JavaMail.jiratomcat@arcas> Subject: [jira] [Updated] (AMQ-4159) Lack of thread-safety in SimpleDiscoveryAgent can cause multiple concurrent attempts to establish bridge, which can result in permanent bridge failure MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/AMQ-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stirling Chow updated AMQ-4159: ------------------------------- Patch Info: Patch Available > Lack of thread-safety in SimpleDiscoveryAgent can cause multiple concurrent attempts to establish bridge, which can result in permanent bridge failure > ------------------------------------------------------------------------------------------------------------------------------------------------------ > > Key: AMQ-4159 > URL: https://issues.apache.org/jira/browse/AMQ-4159 > Project: ActiveMQ > Issue Type: Bug > Affects Versions: 5.8.0 > Reporter: Stirling Chow > > Symptom > ======= > I was diagnosing a deadlock issue in {{DiscoveryNetworkConnector}} and noticed that during one of the tests, concurrent calls were being made to {{DiscoveryNetworkConnector.onServiceAdd}} for the same {{DiscoveryEvent}}. This was unexpected because only a single service (URL) had been given to {{SimpleDiscoveryAgent}}. In fact, during one of the tests I observed dozens of concurrent calls. > Concurrent attempts to establish a bridge to the *same* remote broker are problematic because they expose a number of race conditions in {{DiscoveryNetworkConnector}} and {{RegionBroker}} that can lead to permanent bridge failure, as well as unnecessary thread pool execution/resource usage and logging. > The race conditions will be filed as separate issues. This issue specifically addresses the bug that causes {{SimpleDiscoveryAgent}} to uncontrollably multiply bridge connection attempts. > Cause > ===== > When {{DemandForwardingBridgeSupport}} handles exceptions from either the local or remote sides of the the bridge, it fires a "bridge failed" event: > {code:title=DemandForwardingBridgeSupport.java} > public void serviceLocalException(Throwable error) { > if (!disposed.get()) { > LOG.info("Network connection between " + localBroker + " and " + remoteBroker + " shutdown due to a local error: " + error); > LOG.debug("The local Exception was:" + error, error); > brokerService.getTaskRunnerFactory().execute(new Runnable() { > public void run() { > ServiceSupport.dispose(getControllingService()); > } > }); > fireBridgeFailed(); > } > } > public void serviceRemoteException(Throwable error) { > if (!disposed.get()) { > if (error instanceof SecurityException || error instanceof GeneralSecurityException) { > LOG.error("Network connection between " + localBroker + " and " + remoteBroker + " shutdown due to a remote error: " + error); > } else { > LOG.warn("Network connection between " + localBroker + " and " + remoteBroker + " shutdown due to a remote error: " + error); > } > LOG.debug("The remote Exception was: " + error, error); > brokerService.getTaskRunnerFactory().execute(new Runnable() { > public void run() { > ServiceSupport.dispose(getControllingService()); > } > }); > fireBridgeFailed(); > } > } > private void fireBridgeFailed() { > NetworkBridgeListener l = this.networkBridgeListener; > if (l != null) { > l.bridgeFailed(); > } > } > {code} > {{DiscoveryNetworkConnector}} is the {{NetworkBridgeListener}}, and it's {{bridgeFailed}} method calls back to {{SimpleDiscoveryAgent.serviceFailed(...)}}: > {code:title=DiscoveryNetworkConnectol.java} > protected NetworkBridge createBridge(Transport localTransport, Transport remoteTransport, final DiscoveryEvent event) { > class DiscoverNetworkBridgeListener extends MBeanNetworkListener { > public DiscoverNetworkBridgeListener(BrokerService brokerService, ObjectName connectorName) { > super(brokerService, connectorName); > } > public void bridgeFailed() { > if (!serviceSupport.isStopped()) { > try { > discoveryAgent.serviceFailed(event); > } catch (IOException e) { > } > } > } > } > ... > {code} > In response, {{SimpleDiscoveryAgent.serviceFailed(...)}} pauses for the {{reconnectDelay}} before attempting to re-establish the bridge via {{DiscoveryNetworkConnector.onServiceAdd(...)}}: > {code:title=SimpleDiscoveryAgent.java} > public void serviceFailed(DiscoveryEvent devent) throws IOException { > final SimpleDiscoveryEvent event = (SimpleDiscoveryEvent)devent; > if (sevent.failed.compareAndSet(false, true)) { > listener.onServiceRemove(sevent); > taskRunner.execute(new Runnable() { > public void run() { > // We detect a failed connection attempt because the service > // fails right > // away. > if (event.connectTime + minConnectTime > System.currentTimeMillis()) { > LOG.debug("Failure occurred soon after the discovery event was generated. It will be classified as a connection failure: "+event); > ... > synchronized (sleepMutex) { > try { > if (!running.get()) { > LOG.debug("Reconnecting disabled: stopped"); > return; > } > LOG.debug("Waiting "+event.reconnectDelay+" ms before attempting to reconnect."); > sleepMutex.wait(event.reconnectDelay); > } catch (InterruptedException ie) { > LOG.debug("Reconnecting disabled: " + ie); > Thread.currentThread().interrupt(); > return; > } > } > ... > event.connectTime = System.currentTimeMillis(); > event.failed.set(false); > listener.onServiceAdd(event); > } > }, "Simple Discovery Agent"); > } > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira