activemq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stirling Chow (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AMQ-2541) Extremely slow broker startup when using SimpleDiscoveryAgent with an inactive Network of Brokers.
Date Wed, 14 Nov 2012 18:26:12 GMT

    [ https://issues.apache.org/jira/browse/AMQ-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13497315#comment-13497315
] 

Stirling Chow commented on AMQ-2541:
------------------------------------

This issue is complimentary to AMQ-3195.  AMQ-3195 addressed the case where *multiple* network
connectors started synchronously.  Within a single network connector, the bridges were started
synchronously.  AMQ-2541 addresses the behaviour within a single network connector so that
the bridges are started asynchronously.  The rationale for AMQ-2541 is the same as AMQ-3195
--- sometimes establishing a bridge can be very slow and holds up subsequent bridge formation.

In our use case, we only have a single network connector, but it is configured with multiple
URLs.  Since this is a single network connector, AMQ-3195 does not help since the multiple
URLs within the network connector are attempted synchronously.  AMQ-2541 adds asynchronous
connection establishment within the network connector.
                
> Extremely slow broker startup when using SimpleDiscoveryAgent with an inactive Network
of Brokers.
> --------------------------------------------------------------------------------------------------
>
>                 Key: AMQ-2541
>                 URL: https://issues.apache.org/jira/browse/AMQ-2541
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker, Connector
>    Affects Versions: 5.3.0
>            Reporter: Stirling Chow
>             Fix For: 5.x
>
>         Attachments: AMQ2541.patch, AMQ2541Test.java
>
>
> Symptom
> ========
> An AMQ broker that is configured to join a statically-defined (i.e., using uri="static:(tcp://host1:61616,tcp://host2:61616,tcp://..)
) network of brokers can have an extremely long startup time (in the order of 5+ minutes),
if many of the brokers in the network are not alive.
> The following log entires show the startup of an AMQ Broker (http://192.168.170.112:50000)
that is configured to join a network with three other brokers:
> http://10.10.60.78:50000
> http://10.9.62.135:50000
> http://10.10.60.75:50000)
> The three other have not yet started.
> The log file shows that it takes nearly 4 minutes from BrokerService#start() to return
control to the calling thread (AlarmPoint Node-main):
> 2009-12-18 15:24:46,783 [AlarmPoint Node-main] INFO    -  - ActiveMQ 5.3.0 JMS Message
Broker (localhost) is starting
> ...
> 2009-12-18 15:24:47,158 [AlarmPoint Node-main] INFO    -  - Connector http://192.168.170.112:50000
Started
> 2009-12-18 15:24:47,158 [AlarmPoint Node-main] INFO    -  - Establishing network connection
from vm://localhost to http://10.10.60.78:50000
> ...
> 2009-12-18 15:26:11,314 [AlarmPoint Node-main] WARN    -  - Could not start network bridge
between: vm://localhost and: http://10.10.60.78:50000 due to: java.net.ConnectException: Connection
timed out: connect
> 2009-12-18 15:26:11,314 [AlarmPoint Node-main] DEBUG   -  - Start failure exception:
java.net.ConnectException: Connection timed out: connect
> 2009-12-18 15:26:11,314 [AlarmPoint Node-main] INFO    -  - Establishing network connection
from vm://localhost to http://10.9.62.135:50000
> ...
> 2009-12-18 15:27:35,299 [AlarmPoint Node-main] WARN    -  - Could not start network bridge
between: vm://localhost and: http://10.9.62.135:50000 due to: java.net.ConnectException: Connection
timed out: connect
> 2009-12-18 15:27:35,299 [AlarmPoint Node-main] DEBUG   -  - Start failure exception:
java.net.ConnectException: Connection timed out: connect
> 2009-12-18 15:27:35,299 [AlarmPoint Node-main] INFO    -  - Establishing network connection
from vm://localhost to http://10.10.60.75:50000
> ...
> 2009-12-18 15:28:59,314 [AlarmPoint Node-main] WARN    -  - Could not start network bridge
between: vm://localhost and: http://10.10.60.75:50000 due to: java.net.ConnectException: Connection
timed out: connect
> 2009-12-18 15:28:59,314 [AlarmPoint Node-main] DEBUG   -  - Start failure exception:
java.net.ConnectException: Connection timed out: connect
> 2009-12-18 15:28:59,314 [AlarmPoint Node-main] INFO    -  - Network Connector bridge
Started
> 2009-12-18 15:28:59,314 [AlarmPoint Node-main] INFO    -  - ActiveMQ JMS Message Broker
(localhost, ID:vic-esx4-ns1-1280-1261178686846-0:0) started
> Cause
> =====
> The broker's network connector is implemented by org.apache.activemq.network.DiscoveryNetworkConnector,
which in turn uses org.apache.activemq.transport.discovery.simple.SimpleDiscoveryAgent to
determine whether the URLs configured in uri="static:(tcp://host1:61616,tcp://host2:61616,tcp://..)
" are active.  SimpleDiscoveryAgent#start() has this loop:
>     public void start() throws Exception {
>         running.set(true);
>         for (int i = 0; i < services.length; i++) {
>             listener.onServiceAdd(new SimpleDiscoveryEvent(services[i]));
>         }
>     }
> "listener.onServiceAdd(...) " is called for each URL and is implemented by DiscoveryNetworkConnector#onServiceAdd(...).
 The main thread calls BrokerService#start()  which calls DiscoveryNetworkConnector#start()
which calls SimpleDiscoveryAgent#start(), which sequentially calls DiscoveryNetworkConnector#onServiceAdd(...).
 Since the URLs being "discovered" are inactive, DiscoveryNetworkConnector#onServiceAdd(...)
 blocks ~1m30s (this will depend on network configuration) for each URL.  This blocks the
main thread that is trying to start the broker.  If there are several inactive URLs, then
the blocking time becomes excessive.
> Solution
> =======
> If you follow through the DiscoveryNetworkConnector#onServiceAdd(...) method, it eventially
calls SimpleDiscoveryAgent#serviceFailed(...) for each inactive URL.  In turn SimpleDiscoveryAgent#serviceFailed(...)
launches an asynchronous task that pauses for the configured reconnect delay, and then retries
the call to DiscoveryNetworkConnector#onServiceAdd(...).  So it must be safe to call DiscoveryNetworkConnector#onServiceAdd(...)
concurrently.  Therefore, SimpleDiscoveryAgent#start()'s loop should be changed to launch
asynchronous tasks to make the DiscoveryNetworkConnector#onServiceAdd(...) calls concurrently
rather than synchronously.
> This solution has the benefit of returning control immediately to the caller of SimpleDiscoveryAgent#start(...),
thus starting the broker faster, and allows the network discovery to find "active" URLs much
faster (i.e., with the sequential loop, if the "active" URL is the last one, its discovery
is significantly delayed).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message