activemq-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefanic <>
Subject Re: Consumers hanging on a queue although there are messages in it
Date Thu, 05 Apr 2018 06:08:19 GMT
It has been a while so here's an update.
The same problem has been occurring on and off for the past two months now
and there is one suspect always coming back: message grouping.

We have found an tried several things and here are some of the findings:

*Message groups cache*
ActiveMQ defaults to an LRU cache with size 1024 for storing hashes of the
JMX message group header.
We where grouping on a higher number and could not find where to change this
setting so we went to 1024 message groups in code.
That did not help the 'hanging' problem at all.

*Broker page size*
Because the ActiveMQ broker sends all messages of a group to a single
consumer it needs to load messages in memory. When all messages in memory
are for a single consumer all other messages in the queue are not processed.
Max page size is the parameter to enable the broker to load more messages in
memory and hopefully will find messages for another consumer so flow is not
impacted heavily.

That problem with message groups and some kind of bug within client and/or
broker seems to trigger the hanging state.
When we simulate a lot of messages for a single broker, even within the max
page size, we encounter the hanging state issue (although lately another
variant, more below). Strangely after restart of the client and broker
failover the hanging state disappears so it must be something when running
for a while instead of a full queue when just started.

After changing the maxPageSize (increasing it from 1000 to 10000) we did see
a major decline in incidents, so that definitely has effect (and supporting
the theory above how that causes the hanging state).

The hanging state we encounter recently is a failover transport handler in
the client that seems to think the broker is down/unresponsive and blocks
all consumers for a specific timeout (3 seconds default I think). After that
timeout everything continues for a few seconds and the timeout is triggered
again in an endless loop.
Only way we know how to stop this is restarting the client and performing a
broker failover.

*Next steps*
We are now researching how number of consumers, maxPageSize and client
preFetch settings interact with each other to hopefully find a good setting
for all those parameters. Mostly because the number of consumers directly
affects the number of messages groups per consumer.

Also we upgraded to the latest activemq and camel client libraries and the
latest ActiveMQ broker.
The broker is running quite some time now and the issues continued, the
client libraries update will be released to production soon.

Sent from:

View raw message