Return-Path: Delivered-To: apmail-activemq-dev-archive@www.apache.org Received: (qmail 76454 invoked from network); 6 Nov 2008 18:28:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 Nov 2008 18:28:57 -0000 Received: (qmail 58414 invoked by uid 500); 6 Nov 2008 18:29:03 -0000 Delivered-To: apmail-activemq-dev-archive@activemq.apache.org Received: (qmail 58389 invoked by uid 500); 6 Nov 2008 18:29:03 -0000 Mailing-List: contact dev-help@activemq.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@activemq.apache.org Delivered-To: mailing list dev@activemq.apache.org Received: (qmail 58378 invoked by uid 99); 6 Nov 2008 18:29:03 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Nov 2008 10:29:03 -0800 X-ASF-Spam-Status: No, hits=-4.0 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of mteira@tid.es designates 195.235.93.200 as permitted sender) Received: from [195.235.93.200] (HELO correo-bck.tid.es) (195.235.93.200) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Nov 2008 18:27:47 +0000 Received: from correo.tid.es (htcasmad2.hi.inet [10.95.67.75]) by tid.hi.inet (iPlanet Messaging Server 5.2 HotFix 2.14 (built Aug 8 2006)) with ESMTP id <0K9X00CFXCH7R0@tid.hi.inet> for dev@activemq.apache.org; Thu, 06 Nov 2008 19:24:43 +0100 (MET) Received: from [10.95.89.18] (10.95.67.43) by htcasmad2.hi.inet (10.95.67.75) with Microsoft SMTP Server id 8.1.240.5; Thu, 06 Nov 2008 19:24:42 +0100 Date: Thu, 06 Nov 2008 19:24:42 +0100 From: Manuel Teira Paz Subject: Re: Activemq, socket buffers and async dispatch In-reply-to: <491333B0.4060003@tid.es> To: "dev@activemq.apache.org" Message-id: <4913366A.2040109@tid.es> MIME-version: 1.0 Content-type: text/plain; charset=ISO-8859-1; format=flowed Content-transfer-encoding: 8BIT User-Agent: Thunderbird 2.0.0.17 (Windows/20080914) References: <491333B0.4060003@tid.es> X-Virus-Checked: Checked by ClamAV on apache.org Hello again. Actually, we need both sides of the socket to get full at nearly the same time for the deadlock to occur. But the posibility is there. Regards. Manuel Teira Paz escribi�: > Hello. > > Some time ago, we started suffering deadlock problems in our system, > using activemq (4.1) to handle messaging needs. > > I thought , in the first instance, that the problem was caused by the > consumers threads, since they where writing to the transport socket (to > send acks or committing consumed messages) and so, I considered that > enabling asyncDispatch could be a solution for this problem. > > After a complete failure of this "solution" (the deadlock keeps > happening) I reconsidered again the scenario, a new theory arised > reading that the problem is mostly related with the activemq transport > thread. This is actually the one reading from the socket, but also, in > some ocasions, writing to it, as we can see in the > org.apache.activemq.broker.TransportConnection code: > > this.transport.setTransportListener(new DefaultTransportListener() { > public void onCommand(Object o) { > Command command = (Command) o; > Response response = service(command); > if (response != null) { > dispatchSync(response); > } > } > > public void onException(IOException exception) { > serviceTransportException(exception); > } > }); > > So, any command serviced returning a response, forces the transport > listener to write to the socket, in the dispatchSync call. To do so, it > will try to lock the MutexTransport, if in this very moment, the socket > buffer is getting full, and some of the consumer threads is holding the > MutexTransport, the deadlock will happen (also, the Transport thread > could fall into the deadlock if its write attempt fills the buffer). > There's no way to recover from this situation, since the only thread > that could read from the socket is trying to get the MutexTransport > lock, and the thread holding it will never release it until its > socketWrite0 call finishes. Since nobody is reading, this will never happen. > > Agree with this explanation. Did I miss something? > > Is this any better in the 5.x series? > > Do you think that passing a TaskRunnerFactory in the TransportConnection > constructor and changing that call from dispatchSync to dispatchAsync > could avoid the deadlock? > Is there any drawback to this approach ? > > > Thanks for your time. Please, any feedback will be very appreciated, > since the problem is stopping our production systems. Once it happens, > the consumers on the problematic connection get stuck forever. > > Best regards. > > > Extra bonus, stack traces. A Transport Thread stuck in socketWrite0. > Nobody could write on the socket, and it won't be able to read, since > it's locked writing: > > "ActiveMQ Transport: tcp:///127.0.0.1:17891" daemon prio=10 > tid=0x00c4af30 nid=0x48 runnable [0x2dcff000..0x2dcff9f0] > at java.net.SocketOutputStream.socketWrite0(Native Method) > at java.net.SocketOutputStream.socketWrite(Unknown Source) > at java.net.SocketOutputStream.write(Unknown Source) > at > org.apache.activemq.transport.tcp.TcpBufferedOutputStream.flush(TcpBufferedOutputStream.java:109) > at java.io.DataOutputStream.flush(Unknown Source) > at > org.apache.activemq.transport.tcp.TcpTransport.oneway(TcpTransport.java:119) > at > org.apache.activemq.transport.InactivityMonitor.oneway(InactivityMonitor.java:145) > at > org.apache.activemq.transport.TransportFilter.oneway(TransportFilter.java:80) > at > org.apache.activemq.transport.WireFormatNegotiator.oneway(WireFormatNegotiator.java:93) > at > org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:47) > - locked <0x3d5625c0> (a java.lang.Object) > at > org.apache.activemq.broker.TransportConnection.dispatch(TransportConnection.java:1138) > at > org.apache.activemq.broker.TransportConnection.processDispatch(TransportConnection.java:805) > at > org.apache.activemq.broker.TransportConnection.dispatchSync(TransportConnection.java:770) > at > org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:187) > at > org.apache.activemq.transport.TransportFilter.onCommand(TransportFilter.java:65) > at > org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireFormatNegotiator.java:133) > at > org.apache.activemq.transport.InactivityMonitor.onCommand(InactivityMonitor.java:124) > at > org.apache.activemq.transport.TransportSupport.doConsume(TransportSupport.java:84) > at > org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:137) > at java.lang.Thread.run(Unknown Source) > > > >