activemq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Howard Orner (JIRA)" <j...@apache.org>
Subject [jira] Created: (AMQ-1709) Network of Brokers Memory Leak Due to Race Condition
Date Fri, 02 May 2008 17:57:43 GMT
Network of Brokers Memory Leak Due to Race Condition
----------------------------------------------------

                 Key: AMQ-1709
                 URL: https://issues.apache.org/activemq/browse/AMQ-1709
             Project: ActiveMQ
          Issue Type: Bug
          Components: Broker, Transport
    Affects Versions: 5.0.0, 4.1.2
            Reporter: Howard Orner


When you a a network of brokers configuration with at least 3 brokers, such as:

<broker brokerName="A" persistent="false" ...
...
<transportConnector name="AListener" uri="tcp://localhost:61610"/>
...
<networkConnector name="BConnector" uri="static:(tcp://localhost:61620)"/>
<networkConnector name="CConnector" uri="static:(tcp://localhost:61630)"/>

with the other brokers have a similar configuration.
Then, if you have subscribers trying to connect to all of the brokers you can have a race
condition at start up where the transports accept connections from subscribers before the
network connectors are initialized.  In BrokerService.startAllConnectors(), the transports
are started first.  Then the NetworkConnectors.  As part of starting the network connectors,
their constructors takes a collection obtained by calling getBroker().getDurableDestinations().
 Normally this list would be empty.  However, if clients connect before this is called, a
list is returned for each topic subscribed to.  Then, instead of creating standard TopicSubscriptions
for the network connector, DurableTopicSubscriptions are created.  I'm not sure if this really
should be a problem, but it is because SimpleDispatchPolicy, in the process of iterating through
the DurableTopicSubscriptions, causes messages to be queued up for prefetch without clearing
all of the references (for each pass through it looks like three references are registered
and only two are cleared.  This becomes a memory leak.  In the logs you see a message saying
the PrefetchLimit was reached and then you start seeing logs about memory usage increasing
until it gets to 100% and then everything stops.  

To reproduce this, create a network of brokers configuration of at least 3 brokers -- the
more you have the more likely you are to hit this without a lot of tries so I suggest a bunch.
 Start all brokers.  Establish a publisher on broker A using failover://(tcp://localhost:61610)
then establish a bunch of subscribers on all the brokers using a similar configuration, i.e,
failover://(tcp://localhost:61610), failover://(tcp://localhost:61620).  The more you have
on broker 'A' the better since you are trying to reproduce the race condition.  You want the
others up so that the other brokers expect messages to be passed to them.    Once everybody
is up and happy, kill broker A and restart it.  If you do that enough times, you will hit
the race condition and the memory leak will start.    You can also put a break point in BrokerService.startAllConnectors()
after the transports are started but before the network connectors are started.  That'll give
clients to connect to the transport threads before you tell the VM to continue.

I found it an easy fix to store the durable destination list in a local variable before starting
the transports and passing that to the network connectors instead of separate calls..  I'm
not sure if there are 'normal' ways for that list to be anything other than empty.  If not,
you could just pass an empty set to the network connectors, but suspect there are legitimate
configurations that may need this to requested.  If so, this memory leak would likely occur
in these cases, too.   

I ran into this in 4.1.2.  I haven't tested 5.0 since our attempts to switch to 5.0 were met
with failure due to the number of bugs in 5.0 (already reported by others).  Looking at 5.0.0
source, the race condition is still there in BrokerService.startAllConnectors() so I suspect
the memory leak is there as well.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message