Return-Path: Delivered-To: apmail-activemq-dev-archive@www.apache.org Received: (qmail 63154 invoked from network); 6 Nov 2008 18:13:35 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 Nov 2008 18:13:35 -0000 Received: (qmail 41847 invoked by uid 500); 6 Nov 2008 18:13:41 -0000 Delivered-To: apmail-activemq-dev-archive@activemq.apache.org Received: (qmail 41671 invoked by uid 500); 6 Nov 2008 18:13:41 -0000 Mailing-List: contact dev-help@activemq.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@activemq.apache.org Delivered-To: mailing list dev@activemq.apache.org Received: (qmail 41303 invoked by uid 99); 6 Nov 2008 18:13:41 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Nov 2008 10:13:40 -0800 X-ASF-Spam-Status: No, hits=-4.0 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of mteira@tid.es designates 195.235.93.200 as permitted sender) Received: from [195.235.93.200] (HELO correo-bck.tid.es) (195.235.93.200) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Nov 2008 18:12:24 +0000 Received: from correo.tid.es (htcasmad2.hi.inet [10.95.67.75]) by tid.hi.inet (iPlanet Messaging Server 5.2 HotFix 2.14 (built Aug 8 2006)) with ESMTP id <0K9X00CYPBXUR0@tid.hi.inet> for dev@activemq.apache.org; Thu, 06 Nov 2008 19:13:06 +0100 (MET) Received: from [10.95.89.18] (10.95.67.43) by htcasmad2.hi.inet (10.95.67.75) with Microsoft SMTP Server id 8.1.240.5; Thu, 06 Nov 2008 19:13:05 +0100 Date: Thu, 06 Nov 2008 19:13:04 +0100 From: Manuel Teira Paz Subject: Activemq, socket buffers and async dispatch To: "dev@activemq.apache.org" Message-id: <491333B0.4060003@tid.es> MIME-version: 1.0 Content-type: text/plain; charset=ISO-8859-1; format=flowed Content-transfer-encoding: 7BIT User-Agent: Thunderbird 2.0.0.17 (Windows/20080914) X-Virus-Checked: Checked by ClamAV on apache.org Hello. Some time ago, we started suffering deadlock problems in our system, using activemq (4.1) to handle messaging needs. I thought , in the first instance, that the problem was caused by the consumers threads, since they where writing to the transport socket (to send acks or committing consumed messages) and so, I considered that enabling asyncDispatch could be a solution for this problem. After a complete failure of this "solution" (the deadlock keeps happening) I reconsidered again the scenario, a new theory arised reading that the problem is mostly related with the activemq transport thread. This is actually the one reading from the socket, but also, in some ocasions, writing to it, as we can see in the org.apache.activemq.broker.TransportConnection code: this.transport.setTransportListener(new DefaultTransportListener() { public void onCommand(Object o) { Command command = (Command) o; Response response = service(command); if (response != null) { dispatchSync(response); } } public void onException(IOException exception) { serviceTransportException(exception); } }); So, any command serviced returning a response, forces the transport listener to write to the socket, in the dispatchSync call. To do so, it will try to lock the MutexTransport, if in this very moment, the socket buffer is getting full, and some of the consumer threads is holding the MutexTransport, the deadlock will happen (also, the Transport thread could fall into the deadlock if its write attempt fills the buffer). There's no way to recover from this situation, since the only thread that could read from the socket is trying to get the MutexTransport lock, and the thread holding it will never release it until its socketWrite0 call finishes. Since nobody is reading, this will never happen. Agree with this explanation. Did I miss something? Is this any better in the 5.x series? Do you think that passing a TaskRunnerFactory in the TransportConnection constructor and changing that call from dispatchSync to dispatchAsync could avoid the deadlock? Is there any drawback to this approach ? Thanks for your time. Please, any feedback will be very appreciated, since the problem is stopping our production systems. Once it happens, the consumers on the problematic connection get stuck forever. Best regards. Extra bonus, stack traces. A Transport Thread stuck in socketWrite0. Nobody could write on the socket, and it won't be able to read, since it's locked writing: "ActiveMQ Transport: tcp:///127.0.0.1:17891" daemon prio=10 tid=0x00c4af30 nid=0x48 runnable [0x2dcff000..0x2dcff9f0] at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(Unknown Source) at java.net.SocketOutputStream.write(Unknown Source) at org.apache.activemq.transport.tcp.TcpBufferedOutputStream.flush(TcpBufferedOutputStream.java:109) at java.io.DataOutputStream.flush(Unknown Source) at org.apache.activemq.transport.tcp.TcpTransport.oneway(TcpTransport.java:119) at org.apache.activemq.transport.InactivityMonitor.oneway(InactivityMonitor.java:145) at org.apache.activemq.transport.TransportFilter.oneway(TransportFilter.java:80) at org.apache.activemq.transport.WireFormatNegotiator.oneway(WireFormatNegotiator.java:93) at org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:47) - locked <0x3d5625c0> (a java.lang.Object) at org.apache.activemq.broker.TransportConnection.dispatch(TransportConnection.java:1138) at org.apache.activemq.broker.TransportConnection.processDispatch(TransportConnection.java:805) at org.apache.activemq.broker.TransportConnection.dispatchSync(TransportConnection.java:770) at org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:187) at org.apache.activemq.transport.TransportFilter.onCommand(TransportFilter.java:65) at org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireFormatNegotiator.java:133) at org.apache.activemq.transport.InactivityMonitor.onCommand(InactivityMonitor.java:124) at org.apache.activemq.transport.TransportSupport.doConsume(TransportSupport.java:84) at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:137) at java.lang.Thread.run(Unknown Source)