Mailing-List: contact dev-help@activemq.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@activemq.apache.org
Date: Mon, 10 Nov 2014 17:23:34 +0000 (UTC)
From: "Pete Bertrand (JIRA)" <jira@apache.org>
To: dev@activemq.apache.org
Message-ID: <JIRA.12753265.1415236209000.461314.1415640214586@Atlassian.JIRA>
In-Reply-To: <JIRA.12753265.1415236209000@Atlassian.JIRA>
References: <JIRA.12753265.1415236209000@Atlassian.JIRA>
 <JIRA.12753265.1415236209225@arcas>
Subject: [jira] [Updated] (AMQ-5424) Broker at 100% CPU when idle after
 Network Connection reconnect with duplicates sent
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/AMQ-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pete Bertrand updated AMQ-5424:
-------------------------------
    Fix Version/s: NEEDS_REVIEW

> Broker at 100% CPU when idle after Network Connection reconnect with duplicates sent
> ------------------------------------------------------------------------------------
>
>                 Key: AMQ-5424
>                 URL: https://issues.apache.org/jira/browse/AMQ-5424
>             Project: ActiveMQ
>          Issue Type: Bug
>    Affects Versions: 5.10.0
>            Reporter: Pete Bertrand
>             Fix For: NEEDS_REVIEW
>
>         Attachments: activemq.xml, thread-dump.txt
>
>
> In a network of 2 brokers (A and B) with durable queued messages 
> going from A to B over a duplex NetworkConnector,
> if A is stopped and restarted while messages are in-flight, 
> and if replayed messages from A are recognized as duplicates on B,
> then 30 seconds after B goes idle, B's CPU goes to 100%.
> I have attached the thread dump to the ticket.
> From what I have been able to figure out, the dequeue counter does not count
> moving the duplicate into the DLQ. The counters show a pending message when
> there is none in the persisted queue. So when the scheduler kicks in 30 seconds
> after the broker goes idle, it says "I have a pending message, fetch it from the DB"
> but the fetch returns 0 messages. Immediately the scheduler still sees pending
> messages and does a DB fetch, with no results. This is where the CPU is spinning.
> See the attached thread dump.
> So, in detail:
> It appears that after A is restarted and it replays messages that have not been ACKed,
> B receives duplicate messages and sends them to the DLQ. Here is the warning from the log:
> {noformat}
>   WARN | duplicate message from store ID:host-lnx-59946-1415221396197-1:1:1:1:468, redirecting for dlq processing | org.apache.activemq.broker.region.Queue | ActiveMQ VMTransport: vm://broker1#11-1
> {noformat}
> After all messages are delivered and the brokers are idle for 30 seconds and the CPU on B is now 100%, if you use the WebConsole and look at the queues on B you see the following:
> {noformat}
>               Number Of                	
>    Queue      Pending     Number Of  Messages  Messages
>    Name       Messages    Consumers  Enqueued  Dequeued
> ActiveMQ.DLQ     1            0         1         0
> TEST.FOO         1            1        469       468
> {noformat}
> On this test run, only one message was a duplicate. It was moved to the DLQ, but the TEST.FOO counters show it as pending. The counters are out of sync with actual messages in the persisted queue, because the duplicate message is now in the DLQ and not in the TEST.FOO queue.
> At this point if you purge TEST.FOO, CPU on B goes back to normal because this clears the pending message counter.
> +*Steps to reproduce*+
> Set up 2 brokers as follows:
>   *producer* ==> *broker-A*  <==  duplex network connection  ==>  *broker-B* ==>  *consumer*
> 1) Download the binary distribution of AMQ 5.10.0 and extract apache-activemq-5.10.0-bin.tar.gz
> 2) Create two brokers
> {noformat}
>  $ ACTIVEMQ_HOME/bin/activemq create /path/to/brokers/broker-a
>  $ ACTIVEMQ_HOME/bin/activemq create /path/to/brokers/broker-b
> {noformat}
> 3) Update broker-a to connect to broker-b with a duplex connection.
>    _You can use the attached *activemq.xml*_. It does the following:
> - Sets transport for broker-a to port 61610
> - Sets up networkConnector to connect to broker-b on 61616
> - Does not start jetty web console on broker-a to avoid port conflict
> broker-b is un-modified and defaults to port 61616
> 4) Start the brokers
> {noformat}
>  $ broker-a/bin/broker-a start
>  $ broker-b/bin/broker-b start
> {noformat}
> 5) Start consumer connected to broker-b and producer connected to broker-a
> {noformat}
>  $ ant consumer -Durl=tcp://localhost:61616 -Ddurable=true
>  $ ant producer -Durl=tcp://localhost:61610 -Ddurable=true
> {noformat}
> 6) Stop broker-a before producer is finished sending messages, then restart
> {noformat}
>  $ broker-a/bin/broker-a stop
>  $ broker-a/bin/broker-a start
> {noformat}
> 7) Look at broker-b logs for duplicates, look at broker-b web console for pending messages
>  http://localhost:8161/admin/queues.jsp
> 8) 30 seconds after going idle, broker-b CPU will goto 100%
> 9) Purge TEST.FOO on broker-b, pending messages will reset and CPU will go back to normal.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)