Mailing-List: contact dev-help@activemq.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@activemq.apache.org
Date: Tue, 27 Mar 2012 12:30:27 +0000 (UTC)
From: "Gary Tully (Issue Comment Edited) (JIRA)" <jira@apache.org>
To: dev@activemq.apache.org
Message-ID: 
 <360654389.22930.1332851427807.JavaMail.tomcat@hel.zones.apache.org>
In-Reply-To: 
 <1862662382.10532.1314215189046.JavaMail.tomcat@hel.zones.apache.org>
Subject: [jira] [Issue Comment Edited] (AMQ-3473) Messages (possibly) stuck
 and pending messages count showing high number of pending message which do
 not get sent to a consumer.
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/AMQ-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239404#comment-13239404 ] 

Gary Tully edited comment on AMQ-3473 at 3/27/12 12:28 PM:
-----------------------------------------------------------

There is a problem here with the enqueue counter. If a duplicate message send occurs across a network connector and the message has not been dispatched. The store will add the message to the journal but the index update will recognize the duplicate. It updates the index to reference the last journal insert, but the enqueue counter is not decremented to reflect the duplicate.

The duplicate send can occur if the connection or broker dies after a send but before a reply is received. The message remains unacked and gets redispatched. The redelivery counter will be incremented to reflect the resend in this case.

In 5.6, (https://issues.apache.org/jira/browse/AMQ-3576) there is an optional boolean transportConnector.auditNetworkProducers attribute that can be used to add a producer audit for a network connector. This will suppress the duplicate send in this case. It is disabled by default because it can prevent legit duplicate messages from non conduit topics and virtual destinations from propagating across the network.

The producer audit is a good solution but in the event that the audit cache is exceeded, the duplicate detection by the store should be reflected in the destination enqueue statistic.
                
      was (Author: gtully):
    There is a problem here with the enqueue counter. If a duplicate message send occurs across a network connector and the message has not been dispatched. The store will add the message to the journal but the index update will recognize the duplicate. It updates the index to reference the last journal insert, but the enqueue counter is not decremented to reflect the duplicate.

The duplicate send can occur if the connection or broker dies after a send but before a reply is received. The message remains unacked and gets redispatched. The redelivery counter will be incremented to reflect the resend in this case.

In 5.6, there is an optional boolean transportConnector.auditNetworkProducers attribute that can be used to add a producer audit for a network connector. This will suppress the duplicate send in this case. It is disabled by default because it can prevent legit duplicate messages from non conduit topics and virtual destinations from propagating across the network.

The producer audit is a good solution but in the event that the audit cache is exceeded, the duplicate detection by the store should be reflected in the destination enqueue statistic.

                  
> Messages (possibly) stuck and pending messages count showing high number of pending message which do not get sent to a consumer.
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: AMQ-3473
>                 URL: https://issues.apache.org/jira/browse/AMQ-3473
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Message Store
>    Affects Versions: 5.5.0
>         Environment: Ubuntu 11.04 (64-bit)
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>            Reporter: Mat Sharpe
>         Attachments: activemq.xml
>
>
> Two brokers, each with a network connection to the other. We have two producers producing persistent messages to a single queue at a rate of 20-50/second. There is a single consumer. All clients prefer the primary broker.
> The consumer is 'bursty' - i.e. it grabs 5000 messages and then processes them. During processing, new messages build up on the broker.
> If the primary broker is restarted we will see it come back with, as you would expect, with a number of pending messages. This message count never fully returns to 0 even if the producers are stopped and browsing the queue through the GUI shows either no messages or only messages that were produced since the restart.
> I have turned on Kaha debugging and, after the initial restart, we see the following during every checkpoint:
>  [eckpoint Worker] TRACE MessageDatabase                - Last update: 3974:2450180, full gc candidates set: [3950, 3951, 3973, 3974]
> ...
>  [eckpoint Worker] TRACE MessageDatabase                - gc candidates after dest:1:MyQueue, [3951, 3973]
> ...
>  [eckpoint Worker] TRACE MessageDatabase                - gc candidates: [3951, 3973]
>  [eckpoint Worker] TRACE MessageDatabase                - not removing data file: 3951 as contained ack(s) refer to referenced file: [3950, 3951]
>  [eckpoint Worker] DEBUG MessageDatabase                - Cleanup removing the data files: [3973]
> (I assume that is supposed to say '[Checkpoint Worker]', incidentally)
> After the second restart we will see many:
>  [0.8.0.200:47300] WARN  MessageDatabase                - Duplicate message add attempt rejected. Destination: MyQueue, Message id: ID:node001-58675-1314038640553-0:17:1:1:470776
> Followed by:
>  [eckpoint Worker] TRACE MessageDatabase                - Last update: 3974:13515407, full gc candidates set: [3950, 3951, 3974]
> ...
>  [eckpoint Worker] TRACE MessageDatabase                - gc candidates after dest:1:MyQueue, [3951]
> ...
>  [eckpoint Worker] TRACE MessageDatabase                - gc candidates: [3951]
>  [eckpoint Worker] DEBUG MessageDatabase                - Cleanup removing the data files: [3951]
> This is very similar, if not the same, to AMQ-2955. I have tried setting 'useCache=false' but this does not rectify the issue. This could also be a similar issue to AMQ-3281.
> I will attach a config. Please advise if you would like me to enable further debugging.
> I don't currently have a test harness that replicates this issue and due to the fact this is only happening in our production environment, I'm unable to verify reliably whether messages are being lost, delayed or if this is purely a stats issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira