From dev-return-13242-apmail-activemq-dev-archive=activemq.apache.org@activemq.apache.org Mon Nov 03 16:00:10 2008 Return-Path: Delivered-To: apmail-activemq-dev-archive@www.apache.org Received: (qmail 76266 invoked from network); 3 Nov 2008 16:00:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 3 Nov 2008 16:00:08 -0000 Received: (qmail 70409 invoked by uid 500); 3 Nov 2008 16:00:14 -0000 Delivered-To: apmail-activemq-dev-archive@activemq.apache.org Received: (qmail 70257 invoked by uid 500); 3 Nov 2008 16:00:14 -0000 Mailing-List: contact dev-help@activemq.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@activemq.apache.org Delivered-To: mailing list dev@activemq.apache.org Received: (qmail 70245 invoked by uid 99); 3 Nov 2008 16:00:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Nov 2008 08:00:13 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Nov 2008 15:58:54 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 6DA97234C264 for ; Mon, 3 Nov 2008 07:59:05 -0800 (PST) Message-ID: <1844371431.1225727945447.JavaMail.jira@brutus> Date: Mon, 3 Nov 2008 07:59:05 -0800 (PST) From: "Filip Hanik (JIRA)" To: dev@activemq.apache.org Subject: [jira] Commented: (AMQ-1993) Systems hang due to inability to timeout socket write operation In-Reply-To: <880698479.1225672385414.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/activemq/browse/AMQ-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=47025#action_47025 ] Filip Hanik commented on AMQ-1993: ---------------------------------- That being said, the upper layer can react to the propagating IO exception if needed. But that shouldn't be an issue, this filter does what it is supposed to do. We could add more parameters to make the behavior configurable. > Systems hang due to inability to timeout socket write operation > --------------------------------------------------------------- > > Key: AMQ-1993 > URL: https://issues.apache.org/activemq/browse/AMQ-1993 > Project: ActiveMQ > Issue Type: Bug > Components: Broker > Affects Versions: 5.1.0, 5.2.0 > Environment: Unix (Solaris and Linux tested) > Reporter: Filip Hanik > Assignee: Gary Tully > Priority: Critical > Attachments: patch-1-threadname-filter.patch, patch-3-tcp-writetimeout.patch > > > the blocking Java Socket API doesn't have a timeout on socketWrite invocations. > This means, if a TCP session is dropped or terminated without RST or FIN packets, the operating system it left to eventually time out the session. On the linux kernel this timeout usually takes 15 to 30minutes. > For this entire period, the AMQ server hangs, and producers and consumers are unable to use a topic. > I have created two patches for this at the page: > http://www.hanik.com/covalent/amq/index.html > Let me show a bit more > --------------------------------- > "ActiveMQ Transport: tcp:///X.YYY.XXX.ZZZZ:2011" daemon prio=10 tid=0x0000000055d39000 nid=0xc78 runnable [0x00000000447c9000..0x00000000447cac10] > java.lang.Thread.State: RUNNABLE > at java.net.SocketOutputStream.socketWrite0(Native Method) > at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) > This is a thread stuck in blocking IO, and can be stuck for 30 minutes during the kernel TCP retransmission attempts. > Unfortunately the thread dump is very misleading since the name of the thread, is not the destination or even remotely related to the socket it is operating on. > To mend this, a very simple (and configurable) ThreadNameFilter has been suggested to the patch, that appends the destination and helps the system administrator correctly identify the client that is about to receive data. > ----------------------------------- > at org.apache.activemq.broker.region.Topic.dispatch(Topic.java:581) > at org.apache.activemq.broker.region.Topic.doMessageSend(Topic.java:421) > - locked <0x00002aaaec155818> (a org.apache.activemq.broker.region.Topic) > at org.apache.activemq.broker.region.Topic.send(Topic.java:363) > The lock being held at this issue unfortunately makes the entire Topic single threaded. > When this lock is being held, no other clients (producers and consumers) can publish to/receive from this topic. > And this lock can hold up to 30 minutes. > I consider solving this single threaded behavior a 'feature enhancement' that should be handled separately from this bug. Because even if it is solved, threads still risk being stuck in socketWrite0 for dropped connections that still appear to be established. > For this, I have implemented a socket timeout filter, based on a TransportFilter, this filter only times out connections that are actually writing data. > The two patches are at: > http://www.hanik.com/covalent/amq/patch-1-threadname-filter.patch > http://www.hanik.com/covalent/amq/patch-3-tcp-writetimeout.patch > the binary 0000.jar applies to both 5.1 and trunk and can be used today in existing environments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.