Return-Path: Delivered-To: apmail-activemq-dev-archive@www.apache.org Received: (qmail 64090 invoked from network); 19 Dec 2009 22:04:25 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 19 Dec 2009 22:04:25 -0000 Received: (qmail 19801 invoked by uid 500); 19 Dec 2009 22:04:24 -0000 Delivered-To: apmail-activemq-dev-archive@activemq.apache.org Received: (qmail 19732 invoked by uid 500); 19 Dec 2009 22:04:24 -0000 Mailing-List: contact dev-help@activemq.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@activemq.apache.org Delivered-To: mailing list dev@activemq.apache.org Received: (qmail 19722 invoked by uid 99); 19 Dec 2009 22:04:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 19 Dec 2009 22:04:24 +0000 X-ASF-Spam-Status: No, hits=-1998.5 required=10.0 tests=ALL_TRUSTED,NORMAL_HTTP_TO_IP,WEIRD_PORT X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 19 Dec 2009 22:04:15 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id F26A1234C045 for ; Sat, 19 Dec 2009 14:03:53 -0800 (PST) Message-ID: <1889428079.1261260233985.JavaMail.jira@brutus> Date: Sat, 19 Dec 2009 14:03:53 -0800 (PST) From: "Stirling Chow (JIRA)" To: dev@activemq.apache.org Subject: [jira] Created: (AMQ-2541) Extremely slow broker startup when using SimpleDiscoveryAgent with an inactive Network of Brokers. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: ae95407df07c98740808b2ef9da0087c X-Virus-Checked: Checked by ClamAV on apache.org Extremely slow broker startup when using SimpleDiscoveryAgent with an inactive Network of Brokers. -------------------------------------------------------------------------------------------------- Key: AMQ-2541 URL: https://issues.apache.org/activemq/browse/AMQ-2541 Project: ActiveMQ Issue Type: Bug Components: Broker, Connector Affects Versions: 5.3.0 Reporter: Stirling Chow Symptom ======== An AMQ broker that is configured to join a statically-defined (i.e., using uri="static:(tcp://host1:61616,tcp://host2:61616,tcp://..) ) network of brokers can have an extremely long startup time (in the order of 5+ minutes), if many of the brokers in the network are not alive. The following log entires show the startup of an AMQ Broker (http://192.168.170.112:50000) that is configured to join a network with three other brokers: http://10.10.60.78:50000 http://10.9.62.135:50000 http://10.10.60.75:50000) The three other have not yet started. The log file shows that it takes nearly 4 minutes from BrokerService#start() to return control to the calling thread (AlarmPoint Node-main): 2009-12-18 15:24:46,783 [AlarmPoint Node-main] INFO - - ActiveMQ 5.3.0 JMS Message Broker (localhost) is starting ... 2009-12-18 15:24:47,158 [AlarmPoint Node-main] INFO - - Connector http://192.168.170.112:50000 Started 2009-12-18 15:24:47,158 [AlarmPoint Node-main] INFO - - Establishing network connection from vm://localhost to http://10.10.60.78:50000 ... 2009-12-18 15:26:11,314 [AlarmPoint Node-main] WARN - - Could not start network bridge between: vm://localhost and: http://10.10.60.78:50000 due to: java.net.ConnectException: Connection timed out: connect 2009-12-18 15:26:11,314 [AlarmPoint Node-main] DEBUG - - Start failure exception: java.net.ConnectException: Connection timed out: connect 2009-12-18 15:26:11,314 [AlarmPoint Node-main] INFO - - Establishing network connection from vm://localhost to http://10.9.62.135:50000 ... 2009-12-18 15:27:35,299 [AlarmPoint Node-main] WARN - - Could not start network bridge between: vm://localhost and: http://10.9.62.135:50000 due to: java.net.ConnectException: Connection timed out: connect 2009-12-18 15:27:35,299 [AlarmPoint Node-main] DEBUG - - Start failure exception: java.net.ConnectException: Connection timed out: connect 2009-12-18 15:27:35,299 [AlarmPoint Node-main] INFO - - Establishing network connection from vm://localhost to http://10.10.60.75:50000 ... 2009-12-18 15:28:59,314 [AlarmPoint Node-main] WARN - - Could not start network bridge between: vm://localhost and: http://10.10.60.75:50000 due to: java.net.ConnectException: Connection timed out: connect 2009-12-18 15:28:59,314 [AlarmPoint Node-main] DEBUG - - Start failure exception: java.net.ConnectException: Connection timed out: connect 2009-12-18 15:28:59,314 [AlarmPoint Node-main] INFO - - Network Connector bridge Started 2009-12-18 15:28:59,314 [AlarmPoint Node-main] INFO - - ActiveMQ JMS Message Broker (localhost, ID:vic-esx4-ns1-1280-1261178686846-0:0) started Cause ===== The broker's network connector is implemented by org.apache.activemq.network.DiscoveryNetworkConnector, which in turn uses org.apache.activemq.transport.discovery.simple.SimpleDiscoveryAgent to determine whether the URLs configured in uri="static:(tcp://host1:61616,tcp://host2:61616,tcp://..) " are active. SimpleDiscoveryAgent#start() has this loop: public void start() throws Exception { running.set(true); for (int i = 0; i < services.length; i++) { listener.onServiceAdd(new SimpleDiscoveryEvent(services[i])); } } "listener.onServiceAdd(...) " is called for each URL and is implemented by DiscoveryNetworkConnector#onServiceAdd(...). The main thread calls BrokerService#start() which calls DiscoveryNetworkConnector#start() which calls SimpleDiscoveryAgent#start(), which sequentially calls DiscoveryNetworkConnector#onServiceAdd(...). Since the URLs being "discovered" are inactive, DiscoveryNetworkConnector#onServiceAdd(...) blocks ~1m30s (this will depend on network configuration) for each URL. This blocks the main thread that is trying to start the broker. If there are several inactive URLs, then the blocking time becomes excessive. Solution ======= If you follow through the DiscoveryNetworkConnector#onServiceAdd(...) method, it eventially calls SimpleDiscoveryAgent#serviceFailed(...) for each inactive URL. In turn SimpleDiscoveryAgent#serviceFailed(...) launches an asynchronous task that pauses for the configured reconnect delay, and then retries the call to DiscoveryNetworkConnector#onServiceAdd(...). So it must be safe to call DiscoveryNetworkConnector#onServiceAdd(...) concurrently. Therefore, SimpleDiscoveryAgent#start()'s loop should be changed to launch asynchronous tasks to make the DiscoveryNetworkConnector#onServiceAdd(...) calls concurrently rather than synchronously. This solution has the benefit of returning control immediately to the caller of SimpleDiscoveryAgent#start(...), thus starting the broker faster, and allows the network discovery to find "active" URLs much faster (i.e., with the sequential loop, if the "active" URL is the last one, its discovery is significantly delayed). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.