activemq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jim_b_o <ja...@inaseq.com>
Subject Artemis use of PhantomReference impacting GC performance
Date Thu, 04 Feb 2016 16:29:25 GMT
I have observed a significant degradation in GC performance after upgrading
from JBoss 4.2.3.GA to WildFly 10.0.0.Final.  I'm 99% sure the difference is
due to Artemis ActiveMQ making heavy use of PhantomReference objects
(directly or indirectly).  I'm raising this discussion to see if anyone else
can observe the same behaviour and determine whether to raise an issue
against Artemis or one of its dependencies.

What's changed?


I have JBoss 4.2.3.GA (with jboss-messaging-1.4.5.GA) and WildFly
10.0.0.Final running on identical hardware with identical JVMs (8u71) and GC
settings (using G1GC).  The application is only minimally changed as
required by the API differences between the old and new servers.  The
application load is identical and results in near identical memory
allocation rate and young GC frequency.

*The difference is that on JBoss each young GC takes around 10-12ms whereas
on WildFly they take 16-20ms.*  The additional time is taken in the Object
Copy and Ref Proc phases.  This indicates that there are more Survivor
objects on each GC and more References being processed.  I'm pretty sure the
additional Survivor objects are also the Reference objects.  Running a Java
Mission Control (JMC) Flight Recording on these two systems under identical
production load, WildFly is showing around 11,000 PhantomReferences
processed each GC versus a handful on JBoss.  The number of
PhantomReferences seems to approximately correspond to the number of JMS
messages transmitted between GCs.

Some evidence


In a separate test environment I have WildFly running under a standard test
load.  It is emitting JMS messages at a consistent rate (lower than
production) onto a non-durable Topic.  The following image shows a 2 minute
JMC recording when there is no subscriber on the Topic.  We can see that
there are very few PhantomReferences and the GCs take around 4ms.

<http://activemq.2283324.n4.nabble.com/file/n4706961/WithoutConsumer.png> 

However once I start up an external JMS client subscribed to the Topic the
PhantomReferences appear and the GCs slow down as shown in the next image. 
The messaging is non-XA with auto-ack.

<http://activemq.2283324.n4.nabble.com/file/n4706961/WithConsumer.png> 

Where do the PhantomReferences come from?


I used the debugger to capture where the PhantomReferences are being
created.  This seems to indicate they are generated by Artemis and/or the
associated XNIO remoting...

Thread [Thread-1
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$2@788d198e-832298282)]
(Suspended (breakpoint at line 80 in PhantomReference))	
	Cleaner(PhantomReference<T>).<init>(T, ReferenceQueue<? super T>) line:
80	
	Cleaner.<init>(Object, Runnable) line: 115	
	Cleaner.create(Object, Runnable) line: 133	
	DirectByteBuffer.<init>(int) line: 139	
	ByteBuffer.allocateDirect(int) line: 311	
	UnpooledUnsafeDirectByteBuf.allocateDirect(int) line: 108	
	UnpooledUnsafeDirectByteBuf.<init>(ByteBufAllocator, int, int) line: 69	
	UnpooledByteBufAllocator.newDirectBuffer(int, int) line: 50	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int, int)
line: 155	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int) line:
146	
	NettyServerConnection.createTransportBuffer(int) line: 38	

RemotingConnectionImpl(AbstractRemotingConnection).createTransportBuffer(int)
line: 162	
	SessionReceiveMessage.encode(RemotingConnection) line: 59	
	ChannelImpl.send(Packet, boolean, boolean) line: 225	
	ChannelImpl.sendBatched(Packet) line: 205	
	CoreSessionCallback.sendMessage(ServerMessage, ServerConsumer, int) line:
84	
	ServerConsumerImpl.deliverStandardMessage(MessageReference, ServerMessage)
line: 883	
	ServerConsumerImpl.proceedDeliver(MessageReference) line: 366	
	QueueImpl.proceedDeliver(Consumer, MessageReference) line: 2358	
	QueueImpl.deliver() line: 1873	
	QueueImpl.access$1400(QueueImpl) line: 97	
	QueueImpl$DeliverRunner.run() line: 2581	
	OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run() line: 100	
	ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1142	
	ThreadPoolExecutor$Worker.run() line: 617	
	Thread.run() line: 745	
Thread [default I/O-5] (Suspended (breakpoint at line 80 in
PhantomReference))	
	Cleaner(PhantomReference<T>).<init>(T, ReferenceQueue<? super T>) line:
80	
	Cleaner.<init>(Object, Runnable) line: 115	
	Cleaner.create(Object, Runnable) line: 133	
	DirectByteBuffer.<init>(int) line: 139	
	ByteBuffer.allocateDirect(int) line: 311	
	UnpooledUnsafeDirectByteBuf.allocateDirect(int) line: 108	
	UnpooledUnsafeDirectByteBuf.<init>(ByteBufAllocator, int, int) line: 69	
	UnpooledByteBufAllocator.newDirectBuffer(int, int) line: 50	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int, int)
line: 155	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int) line:
146	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).ioBuffer(int) line: 107	
	AdaptiveRecvByteBufAllocator$HandleImpl.allocate(ByteBufAllocator) line:
104	

AbstractXnioSocketChannel$ReadListener.handleEvent(ConduitStreamSourceChannel)
line: 426	
	AbstractXnioSocketChannel$ReadListener.handleEvent(Channel) line: 371	
	ChannelListeners.invokeChannelListener(T, ChannelListener<? super T>) line:
92	
	ReadReadyHandler$ChannelListenerHandler<C>.readReady() line: 66	
	NioSocketConduit.handleReady(int) line: 88	
	WorkerThread.run() line: 559	
Thread [default I/O-4] (Suspended (breakpoint at line 80 in
PhantomReference))	
	Cleaner(PhantomReference<T>).<init>(T, ReferenceQueue<? super T>) line:
80	
	Cleaner.<init>(Object, Runnable) line: 115	
	Cleaner.create(Object, Runnable) line: 133	
	DirectByteBuffer.<init>(int) line: 139	
	ByteBuffer.allocateDirect(int) line: 311	
	UnpooledUnsafeDirectByteBuf.allocateDirect(int) line: 108	
	UnpooledUnsafeDirectByteBuf.<init>(ByteBufAllocator, int, int) line: 69	
	UnpooledByteBufAllocator.newDirectBuffer(int, int) line: 50	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int, int)
line: 155	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int) line:
146	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).ioBuffer(int) line: 107	
	AdaptiveRecvByteBufAllocator$HandleImpl.allocate(ByteBufAllocator) line:
104	

AbstractXnioSocketChannel$ReadListener.handleEvent(ConduitStreamSourceChannel)
line: 426	
	AbstractXnioSocketChannel$ReadListener.handleEvent(Channel) line: 371	
	ChannelListeners.invokeChannelListener(T, ChannelListener<? super T>) line:
92	
	ReadReadyHandler$ChannelListenerHandler<C>.readReady() line: 66	
	NioSocketConduit.handleReady(int) line: 88	
	WorkerThread.run() line: 559	
Thread [default I/O-2] (Suspended (breakpoint at line 80 in
PhantomReference))	
	Cleaner(PhantomReference<T>).<init>(T, ReferenceQueue<? super T>) line:
80	
	Cleaner.<init>(Object, Runnable) line: 115	
	Cleaner.create(Object, Runnable) line: 133	
	DirectByteBuffer.<init>(int) line: 139	
	ByteBuffer.allocateDirect(int) line: 311	
	UnpooledUnsafeDirectByteBuf.allocateDirect(int) line: 108	
	UnpooledUnsafeDirectByteBuf.<init>(ByteBufAllocator, int, int) line: 69	
	UnpooledByteBufAllocator.newDirectBuffer(int, int) line: 50	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int, int)
line: 155	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int) line:
146	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).ioBuffer(int) line: 107	
	AdaptiveRecvByteBufAllocator$HandleImpl.allocate(ByteBufAllocator) line:
104	

AbstractXnioSocketChannel$ReadListener.handleEvent(ConduitStreamSourceChannel)
line: 426	
	AbstractXnioSocketChannel$ReadListener.handleEvent(Channel) line: 371	
	ChannelListeners.invokeChannelListener(T, ChannelListener<? super T>) line:
92	
	ReadReadyHandler$ChannelListenerHandler<C>.readReady() line: 66	
	NioSocketConduit.handleReady(int) line: 88	
	WorkerThread.run() line: 559
	
What's next?


So it's seems pretty certain that this is where the change is coming from. 
The total GC load is still low but we're aiming for consistent low latency
and this is a backwards step.

I'd be interested in any comments or questions.  I can provide more info if
required.

Thanks.



--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-use-of-PhantomReference-impacting-GC-performance-tp4706961.html
Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.

Mime
View raw message