activemq-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Klaus Pittig (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AMQ-6115) No more browse/consume possible after #checkpoint run
Date Wed, 13 Jan 2016 15:45:39 GMT

    [ https://issues.apache.org/jira/browse/AMQ-6115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096370#comment-15096370
] 

Klaus Pittig commented on AMQ-6115:
-----------------------------------

Just to complete the known information used for the corresponding mailing list thread: http://activemq.2283324.n4.nabble.com/How-to-avoid-blocking-of-queue-browsing-after-ActiveMQ-checkpoint-call-td4705696.html

Tim Bain:
{quote}
I believe you are correct: browsing a persistent queue uses bytes from the 
memory store, because those bytes must be read from the persistence store 
into the memory store before they can be handed off to browsers or 
consumers.  If all available bytes in the memory store are already in use, 
the messages can't be paged into the memory store, and so the operation 
that required them to be paged in will hang/fail. 

You can work around the problem by increasing your memory store size via 
trial-and-error until the problem goes away.  Note that the broker itself 
needs some amount of memory, so you can't give the whole heap over to the 
memory store or you'll risk getting OOMs, which means you may need to 
increase the heap size as well.  You can estimate how much memory the 
broker needs aside from the memory store by subtracting the bytes used for 
the memory store (539 MB) from the total heap bytes used as measured via 
JConsole or similar tools.  I'd double (or more) that number to be safe, if 
it was me; the last thing I want to deal with in a production application 
(ActiveMQ or anything else) is running out of memory because I tried to cut 
the memory limits too close just to save a little RAM. 

All of that is how to work around the fact that before you try to browse 
your queue, something else has already consumed all available bytes in the 
memory store.  If you want to dig into why that's happening, we'd need to 
try to figure out what those bytes are being used for and whether it's 
possible to change configuration values to reduce the usage so it fits into 
your current limit.  There will definitely be more effort required than 
simply increasing the memory limit (and max heap size), but we can try if 
you're not able to increase the limits enough to fix the problem. 

If you want to go down that path, one thread to pull on is your observation 
that you "can browse/consume some Queues  _until_ the #checkpoint call 
after 30 seconds."  I assume from your reference to checkpointing that 
you're using KahaDB as your persistence store.  Can you post the KahaDB 
portion of your config? 

Your statements here and in your StackOverflow post ( 
http://stackoverflow.com/questions/34679854/how-to-avoid-blocking-of-queue-browsing-after-activemq-checkpoint-call)

indicate that you think that the problem is that memory isn't getting 
garbage collected after the operation that needed it (i.e. the checkpoint) 
completes, but it's also possible that the checkpoint operation isn't 
completing because it can't get enough messages read into the memory 
store.  Have you confirmed via the thread dump that there is not a 
checkpoint operation still in progress?  Also, how large are your journal 
files that are getting checkpointed?  If they're large enough that all 
messages for one file won't fit into the memory store, you might be able to 
prevent the problem by using smaller files. 
{quote}

a.) Regarding your last answer (thanks for your effort by the way):

I'm aware of the relation between the heap and the systemUsage memoryLimit and we make sure
that there are no illogical settings.
The primary requirement is to have a stable system running 'forever' w/o any memory issues
at any time independent from the load/throughput.
No one really wants to deal with memory settings on the edge of limits.

You're right: the memory is completely consumed. And I can't guarantee the checkpoint/cleanup
to be finished completely, so the system can be stalled without giving GC a chance to release
some memory.

It's the expiry check causing this. The persistent stores themselves seem to be managed as
expected (no issues, no inconsistency, no loss);
our situation is independent of the storage (reproducable for leveldb and kahadb). For KahaDB
we use 16mb for journal files since years (helps to save a huge amount of space required for
pending messages not consumed for some days due to offline situations on client side).
Anyway, here is our current configuration you requested:

{code:xml}
<persistenceAdapter>
<kahaDB directory="${activemq.base}/data/kahadb" enableIndexWriteAsync="true" journalMaxFileLength="16mb"
indexWriteBatchSize="10000" indexCacheSize="10000" />
<!--
<levelDB directory="${activemq.base}/data/leveldb" logSize="33554432" />
-->
</persistenceAdapter>
{code}

b.) Some proposal concerning AMQ-6115:

In my point of view, it's worth to discuss the one and only memoryLimit parameter used for
both the regular browse/consume threads and the checkpoint/cleanup threads.
There should always be enough space to browse/consume any queue at least with prefetch 1 resp.
one of the next pending messages.
Maybe - in this case - 2 well-balanced memoryLimit parameters with priority on consumption
instead of checkpoint/cleanup are helpful for a a better regulation. Or something near it.


c.) Our results and an acceptable solution so far:

After a thorough investigation (w/o changing ActiveMQ source code) the result is for now that
we need to accept the limitations defined by the single memoryLimit parameter used both for
the #checkpoint/cleanup process and browsing/consuming queues.

**1.) Memory**

There is not a problem, if we use a much higher memoryLimit (together
with a higher max-heap) to support both the message caching per
destination during the #checkpoint/cleanup workflow and our requirements to browse/consume
messages.

But more memory is not an option in our scenario, we need to deal with 1024m max-heap and
500m memoryLimit.

Besides this, constantly setting higher memoryLimits just because of more persistent queues
containing hundreds/thousands of pending messages together with certain offline/inactive consumer
scenarios should be discussed in detail (IMHO).


**2.) Persistent Adapters**

We ruled out persistent adapters as the cause of the problem, because the behaviour doesn't
change, if we switch different types of persistent stores (KahaDB, LevelDB, JDBC-PostgreSQL).

During the debugging sessions with KahaDB we also see regular checkpoint handling, the storage
is managed as expected.


**3.) Destination Policy / Expiration Check**

Our problem completely disappears, if we disable caching and the expiration check, which is
the actual cause of the problem.

The corresponding properties are documented and there is a nice blog article about Message
Priorities with a description quite suitable for our scenario:

- http://activemq.apache.org/how-can-i-support-priority-queues.html
- http://blog.christianposta.com/activemq/activemq-message-priorities-how-it-works/

We simply added useCache="false" and expireMessagesPeriod="0" to the
policyEntry:

{code:xml}
<destinationPolicy>
<policyMap>
<policyEntries>
<policyEntry queue=">" producerFlowControl="false" optimizedDispatch="true" memoryLimit="128mb"
timeBeforeDispatchStarts="1000"
useCache="false" expireMessagesPeriod="0">
<dispatchPolicy>
<strictOrderDispatchPolicy />
</dispatchPolicy>
<pendingQueuePolicy>
<storeCursor />
</pendingQueuePolicy>
</policyEntry>
</policyEntries>
</policyMap>
</destinationPolicy>
{code}

The consequences are clear, if we don't use in-mem caching anymore and never check for message
expiration.

For we neither use message expiration nor message priorities and the current message dispatching
is fast enough for us, this trade-off is acceptable regarding given system limitations.

One should also think about well-defined prefetch limits for memory consumption during specific
workflows. Message sizes in our scenario can be 2 Bytes up to approx. 100 KB, so more individual
policyEntries and client consumer configurations could be helpful to optimize system behaviour
concerning performance and memory usage (see http://activemq.apache.org/per-destination-policies.html).




> No more browse/consume possible after #checkpoint run
> -----------------------------------------------------
>
>                 Key: AMQ-6115
>                 URL: https://issues.apache.org/jira/browse/AMQ-6115
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: activemq-leveldb-store, Broker, KahaDB
>    Affects Versions: 5.5.1, 5.11.2, 5.13.0
>         Environment: OS=Linux,MacOS,Windows, Java=1.7,1.8, Xmx=1024m, SystemUsage Memory
Limit 500 MB, Temp Limit 1 GB, Storage 80 GB
>            Reporter: Klaus Pittig
>         Attachments: Bildschirmfoto 2016-01-08 um 12.09.34.png, Bildschirmfoto 2016-01-08
um 13.29.08.png
>
>
> We are currently facing a problem when Using ActiveMQ with a large number of Persistence
Queues (250) á 1000 persistent TextMessages á 10 KB.
> Our scenario requires these messages to remain in the storage over a long time (days),
until they are consumed (large amounts of data are staged for distribution for many consumer,
that could be offline for some days).
> This issue is independent of the JVM,  OS and PersistentAdapter (KahaDB, LevelDB) with
enough free space and memory.
> We tested this behaviour with ActiveMQ: 5.11.2, 5.13.0 and 5.5.1.
> After the Persistence Store is filled with these Messages (we use a simple unit test
for production always the same message) and a broker restart, we can browse/consume some Queues
 _until_ the #checkpoint call after 30 seconds.
> This call causes the broker to use all available memory and never releases it for other
tasks such as Queue browse/consume. Internally the MessageCursor seems to decide, that there
is not enough memory and stops delivery of queue content to browsers/consumers.
> => Is there a way to avoid this behaviour of fix this? 
> The expectation is, that we can consume/browse any queue under all circumstances.
> Besides the above mentioned settings we use the following settings for the broker (btw:
changing the memoryLimit to a lower value like 1mb does not change the situation):
> {code:xml}
>         <destinationPolicy>
>             <policyMap>
>               <policyEntries>
>                 <policyEntry queue=">" producerFlowControl="false"
> optimizedDispatch="true" memoryLimit="128mb">
>                   <dispatchPolicy>
>                     <strictOrderDispatchPolicy />
>                   </dispatchPolicy>
>                   <pendingQueuePolicy>
>                     <storeCursor/>
>                   </pendingQueuePolicy>
>                 </policyEntry>
>               </policyEntries>
>             </policyMap>
>         </destinationPolicy>
>         <systemUsage>
>             <systemUsage sendFailIfNoSpace="true">
>                 <memoryUsage>
>                     <memoryUsage limit="500 mb"/>
>                 </memoryUsage>
>                 <storeUsage>
>                     <storeUsage limit="80000 mb"/>
>                 </storeUsage>
>                 <tempUsage>
>                     <tempUsage limit="1000 mb"/>
>                 </tempUsage>
>             </systemUsage>
>         </systemUsage>
> {code}
> If we set the *cursorMemoryHighWaterMark* in the destinationPolicy to a higher value
like *150* or *600* depending on the difference between memoryUsage and the available heap
space relieves the situation a bit for a workaround, but this is not really an option for
production systems in my point of view.
> Attached some information from Oracle Mission Control and JProfiler showing those ActiveMQTextMessage
instances that are never released from memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message