qpid-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ramayan Tiwari <ramayan.tiw...@gmail.com>
Subject Re: Java broker OOM due to DirectMemory
Date Thu, 20 Apr 2017 01:52:34 GMT
Another issue that we noticed is when broker goes OOM due to direct memory,
it doesn't create heap dump (specified by "-XX:+HeapDumpOnOutOfMemoryError"),
even when the OOM error is same as what is mentioned in the oracle JVM docs
("java.lang.OutOfMemoryError").

Has anyone been able to find a way to get to heap dump for DM OOM?

- Ramayan

On Wed, Apr 19, 2017 at 11:21 AM, Ramayan Tiwari <ramayan.tiwari@gmail.com>
wrote:

> Alex,
>
> Below are the flow to disk logs from broker having 3million+ messages at
> this time. We only have one virtual host. Time is in GMT. Looks like flow
> to disk is active on the whole virtual host and not a queue level.
>
> When the same broker went OOM yesterday, I did not see any flow to disk
> logs from when it was started until it crashed (crashed twice within 4hrs).
>
>
> 4/19/17 4:17:43.509 AM INFO  [Housekeeping[test]] - [Housekeeping[test]]
> BRK-1014 : Message flow to disk active :  Message memory use 3356539KB
> exceeds threshold 3355443KB
> 4/19/17 2:31:13.502 AM INFO  [Housekeeping[test]] - [Housekeeping[test]]
> BRK-1015 : Message flow to disk inactive : Message memory use 3354866KB
> within threshold 3355443KB
> 4/19/17 2:28:43.511 AM INFO  [Housekeeping[test]] - [Housekeeping[test]]
> BRK-1014 : Message flow to disk active :  Message memory use 3358509KB
> exceeds threshold 3355443KB
> 4/19/17 2:20:13.500 AM INFO  [Housekeeping[test]] - [Housekeeping[test]]
> BRK-1015 : Message flow to disk inactive : Message memory use 3353501KB
> within threshold 3355443KB
> 4/19/17 2:18:13.500 AM INFO  [Housekeeping[test]] - [Housekeeping[test]]
> BRK-1014 : Message flow to disk active :  Message memory use 3357544KB
> exceeds threshold 3355443KB
> 4/19/17 2:08:43.501 AM INFO  [Housekeeping[test]] - [Housekeeping[test]]
> BRK-1015 : Message flow to disk inactive : Message memory use 3353236KB
> within threshold 3355443KB
> 4/19/17 2:08:13.501 AM INFO  [Housekeeping[test]] - [Housekeeping[test]]
> BRK-1014 : Message flow to disk active :  Message memory use 3356704KB
> exceeds threshold 3355443KB
> 4/19/17 2:00:43.500 AM INFO  [Housekeeping[test]] - [Housekeeping[test]]
> BRK-1015 : Message flow to disk inactive : Message memory use 3353511KB
> within threshold 3355443KB
> 4/19/17 2:00:13.504 AM INFO  [Housekeeping[test]] - [Housekeeping[test]]
> BRK-1014 : Message flow to disk active :  Message memory use 3357948KB
> exceeds threshold 3355443KB
> 4/19/17 1:50:43.501 AM INFO  [Housekeeping[test]] - [Housekeeping[test]]
> BRK-1015 : Message flow to disk inactive : Message memory use 3355310KB
> within threshold 3355443KB
> 4/19/17 1:47:43.501 AM INFO  [Housekeeping[test]] - [Housekeeping[test]]
> BRK-1014 : Message flow to disk active :  Message memory use 3365624KB
> exceeds threshold 3355443KB
> 4/19/17 1:43:43.501 AM INFO  [Housekeeping[test]] - [Housekeeping[test]]
> BRK-1015 : Message flow to disk inactive : Message memory use 3355136KB
> within threshold 3355443KB
> 4/19/17 1:31:43.509 AM INFO  [Housekeeping[test]] - [Housekeeping[test]]
> BRK-1014 : Message flow to disk active :  Message memory use 3358683KB
> exceeds threshold 3355443KB
>
>
> After production release (2days back), we have seen 4 crashes in 3
> different brokers, this is the most pressing concern for us in decision if
> we should roll back to 0.32. Any help is greatly appreciated.
>
> Thanks
> Ramayan
>
> On Wed, Apr 19, 2017 at 9:36 AM, Oleksandr Rudyy <orudyy@gmail.com> wrote:
>
>> Ramayan,
>> Thanks for the details. I would like to clarify whether flow to disk was
>> triggered today for 3 million messages?
>>
>> The following logs are issued for flow to disk:
>> BRK-1014 : Message flow to disk active :  Message memory use
>> {0,number,#}KB
>> exceeds threshold {1,number,#.##}KB
>> BRK-1015 : Message flow to disk inactive : Message memory use
>> {0,number,#}KB within threshold {1,number,#.##}KB
>>
>> Kind Regards,
>> Alex
>>
>>
>> On 19 April 2017 at 17:10, Ramayan Tiwari <ramayan.tiwari@gmail.com>
>> wrote:
>>
>> > Hi Alex,
>> >
>> > Thanks for your response, here are the details:
>> >
>> > We use "direct" exchange, without persistence (we specify NON_PERSISTENT
>> > that while sending from client) and use BDB store. We use JSON virtual
>> host
>> > type. We are not using SSL.
>> >
>> > When the broker went OOM, we had around 1.3 million messages with 100
>> bytes
>> > average message size. Direct memory allocation (value read from MBean)
>> kept
>> > going up, even though it wouldn't need more DM to store these many
>> > messages. DM allocated persisted at 99% for about 3 and half hours
>> before
>> > crashing.
>> >
>> > Today, on the same broker we have 3 million messages (same message size)
>> > and DM allocated is only at 8%. This seems like there is some issue with
>> > de-allocation or a leak.
>> >
>> > I have uploaded the memory utilization graph here:
>> > https://drive.google.com/file/d/0Bwi0MEV3srPRVHFEbDlIYUpLaUE/
>> > view?usp=sharing
>> > Blue line is DM allocated, Yellow is DM Used (sum of queue payload) and
>> Red
>> > is heap usage.
>> >
>> > Thanks
>> > Ramayan
>> >
>> > On Wed, Apr 19, 2017 at 4:10 AM, Oleksandr Rudyy <orudyy@gmail.com>
>> wrote:
>> >
>> > > Hi Ramayan,
>> > >
>> > > Could please share with us the details of messaging use case(s) which
>> > ended
>> > > up in OOM on broker side?
>> > > I would like to reproduce the issue on my local broker in order to fix
>> > it.
>> > > I would appreciate if you could provide as much details as possible,
>> > > including, messaging topology, message persistence type, message
>> > > sizes,volumes, etc.
>> > >
>> > > Qpid Broker 6.0.x uses direct memory for keeping message content and
>> > > receiving/sending data. Each plain connection utilizes 512K of direct
>> > > memory. Each SSL connection uses 1M of direct memory. Your memory
>> > settings
>> > > look Ok to me.
>> > >
>> > > Kind Regards,
>> > > Alex
>> > >
>> > >
>> > > On 18 April 2017 at 23:39, Ramayan Tiwari <ramayan.tiwari@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi All,
>> > > >
>> > > > We are using Java broker 6.0.5, with patch to use MultiQueueConsumer
>> > > > feature. We just finished deploying to production and saw couple of
>> > > > instances of broker OOM due to running out of DirectMemory buffer
>> > > > (exceptions at the end of this email).
>> > > >
>> > > > Here is our setup:
>> > > > 1. Max heap 12g, max direct memory 4g (this is opposite of what the
>> > > > recommendation is, however, for our use cause message payload is
>> really
>> > > > small ~400bytes and is way less than the per message overhead of
>> 1KB).
>> > In
>> > > > perf testing, we were able to put 2 million messages without any
>> > issues.
>> > > > 2. ~400 connections to broker.
>> > > > 3. Each connection has 20 sessions and there is one multi queue
>> > consumer
>> > > > attached to each session, listening to around 1000 queues.
>> > > > 4. We are still using 0.16 client (I know).
>> > > >
>> > > > With the above setup, the baseline utilization (without any
>> messages)
>> > for
>> > > > direct memory was around 230mb (with 410 connection each taking
>> 500KB).
>> > > >
>> > > > Based on our understanding of broker memory allocation, message
>> payload
>> > > > should be the only thing adding to direct memory utilization (on
>> top of
>> > > > baseline), however, we are experiencing something completely
>> different.
>> > > In
>> > > > our last broker crash, we see that broker is constantly running with
>> > 90%+
>> > > > direct memory allocated, even when message payload sum from all the
>> > > queues
>> > > > is only 6-8% (these % are against available DM of 4gb). During these
>> > high
>> > > > DM usage period, heap usage was around 60% (of 12gb).
>> > > >
>> > > > We would like some help in understanding what could be the reason
of
>> > > these
>> > > > high DM allocations. Are there things other than message payload and
>> > AMQP
>> > > > connection, which use DM and could be contributing to these high
>> usage?
>> > > >
>> > > > Another thing where we are puzzled is the de-allocation of DM byte
>> > > buffers.
>> > > > From log mining of heap and DM utilization, de-allocation of DM
>> doesn't
>> > > > correlate with heap GC. If anyone has seen any documentation
>> related to
>> > > > this, it would be very helpful if you could share that.
>> > > >
>> > > > Thanks
>> > > > Ramayan
>> > > >
>> > > >
>> > > > *Exceptions*
>> > > >
>> > > > java.lang.OutOfMemoryError: Direct buffer memory
>> > > > at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.8.0_40]
>> > > > at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
>> > > > ~[na:1.8.0_40]
>> > > > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
>> > > ~[na:1.8.0_40]
>> > > > at
>> > > > org.apache.qpid.bytebuffer.QpidByteBuffer.allocateDirect(
>> > > > QpidByteBuffer.java:474)
>> > > > ~[qpid-common-6.0.5.jar:6.0.5]
>> > > > at
>> > > > org.apache.qpid.server.transport.NonBlockingConnectionPlainD
>> elegate.
>> > > > restoreApplicationBufferForWrite(NonBlockingConnectionPlainDele
>> > > > gate.java:93)
>> > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>> > > > at
>> > > > org.apache.qpid.server.transport.NonBlockingConnectionPlainDele
>> > > > gate.processData(NonBlockingConnectionPlainDelegate.java:60)
>> > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>> > > > at
>> > > > org.apache.qpid.server.transport.NonBlockingConnection.doRead(
>> > > > NonBlockingConnection.java:506)
>> > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>> > > > at
>> > > > org.apache.qpid.server.transport.NonBlockingConnection.doWork(
>> > > > NonBlockingConnection.java:285)
>> > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>> > > > at
>> > > > org.apache.qpid.server.transport.NetworkConnectionScheduler.
>> > > > processConnection(NetworkConnectionScheduler.java:124)
>> > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>> > > > at
>> > > > org.apache.qpid.server.transport.SelectorThread$ConnectionPr
>> ocessor.
>> > > > processConnection(SelectorThread.java:504)
>> > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>> > > > at
>> > > > org.apache.qpid.server.transport.SelectorThread$
>> > > > SelectionTask.performSelect(SelectorThread.java:337)
>> > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>> > > > at
>> > > > org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(
>> > > > SelectorThread.java:87)
>> > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>> > > > at
>> > > > org.apache.qpid.server.transport.SelectorThread.run(
>> > > > SelectorThread.java:462)
>> > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>> > > > at
>> > > > java.util.concurrent.ThreadPoolExecutor.runWorker(
>> > > > ThreadPoolExecutor.java:1142)
>> > > > ~[na:1.8.0_40]
>> > > > at
>> > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(
>> > > > ThreadPoolExecutor.java:617)
>> > > > ~[na:1.8.0_40]
>> > > > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_40]
>> > > >
>> > > >
>> > > >
>> > > > *Second exception*
>> > > > java.lang.OutOfMemoryError: Direct buffer memory
>> > > > at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.8.0_40]
>> > > > at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
>> > > > ~[na:1.8.0_40]
>> > > > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
>> > > ~[na:1.8.0_40]
>> > > > at
>> > > > org.apache.qpid.bytebuffer.QpidByteBuffer.allocateDirect(
>> > > > QpidByteBuffer.java:474)
>> > > > ~[qpid-common-6.0.5.jar:6.0.5]
>> > > > at
>> > > > org.apache.qpid.server.transport.NonBlockingConnectionPlainDele
>> > > > gate.<init>(NonBlockingConnectionPlainDelegate.java:45)
>> > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>> > > > at
>> > > > org.apache.qpid.server.transport.NonBlockingConnection.
>> > > > setTransportEncryption(NonBlockingConnection.java:625)
>> > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>> > > > at
>> > > > org.apache.qpid.server.transport.NonBlockingConnection.<init>(
>> > > > NonBlockingConnection.java:117)
>> > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>> > > > at
>> > > > org.apache.qpid.server.transport.NonBlockingNetworkTransport.
>> > > > acceptSocketChannel(NonBlockingNetworkTransport.java:158)
>> > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>> > > > at
>> > > > org.apache.qpid.server.transport.SelectorThread$SelectionTas
>> k$1.run(
>> > > > SelectorThread.java:191)
>> > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>> > > > at
>> > > > org.apache.qpid.server.transport.SelectorThread.run(
>> > > > SelectorThread.java:462)
>> > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>> > > > at
>> > > > java.util.concurrent.ThreadPoolExecutor.runWorker(
>> > > > ThreadPoolExecutor.java:1142)
>> > > > ~[na:1.8.0_40]
>> > > > at
>> > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(
>> > > > ThreadPoolExecutor.java:617)
>> > > > ~[na:1.8.0_40]
>> > > > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_40]
>> > > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message