Return-Path: Delivered-To: apmail-jakarta-tomcat-dev-archive@www.apache.org Received: (qmail 11583 invoked from network); 2 Feb 2005 21:51:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 2 Feb 2005 21:51:58 -0000 Received: (qmail 47693 invoked by uid 500); 2 Feb 2005 21:51:47 -0000 Delivered-To: apmail-jakarta-tomcat-dev-archive@jakarta.apache.org Received: (qmail 47599 invoked by uid 500); 2 Feb 2005 21:51:47 -0000 Mailing-List: contact tomcat-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Tomcat Developers List" Reply-To: "Tomcat Developers List" Delivered-To: mailing list tomcat-dev@jakarta.apache.org Received: (qmail 47521 invoked by uid 99); 2 Feb 2005 21:51:47 -0000 X-ASF-Spam-Status: No, hits=0.2 required=10.0 tests=NO_REAL_NAME X-Spam-Check-By: apache.org Received: from ajax-1.apache.org (HELO ajax.apache.org) (192.87.106.226) by apache.org (qpsmtpd/0.28) with ESMTP; Wed, 02 Feb 2005 13:51:45 -0800 Received: from ajax.apache.org (ajax.apache.org [127.0.0.1]) by ajax.apache.org (8.12.11/8.12.11) with ESMTP id j12Lph0S008749 for ; Wed, 2 Feb 2005 22:51:43 +0100 Received: (from nobody@localhost) by ajax.apache.org (8.12.11/8.12.11/Submit) id j12Lpgef008747; Wed, 2 Feb 2005 22:51:42 +0100 Date: Wed, 2 Feb 2005 22:51:42 +0100 Message-Id: <200502022151.j12Lpgef008747@ajax.apache.org> From: bugzilla@apache.org To: tomcat-dev@jakarta.apache.org Subject: DO NOT REPLY [Bug 33374] New: - Connections are not reset under heavy load with lots of closed remote connections X-Bugzilla-Reason: AssignedTo X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG� RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT . ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND� INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=33374 Summary: Connections are not reset under heavy load with lots of closed remote connections Product: Tomcat 5 Version: 5.0.28 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Native:JK AssignedTo: tomcat-dev@jakarta.apache.org ReportedBy: lars@worldlingo.com Hi, There is a problem with mod_jk and Tomcat under high load that causes slots to be kept by Apache even though the backend Tomcat links has gone down. This is caused by too eager user aborts, ie. where you have web clients that abort the connection by just closing it. You get a broken pipe on the Tomcat side and eventually that is cleared out - since under less traffic this is not an issue. Now consider that you get hammered with requests and many are aborted, ie. out of 30 per seconds 10 are aborted. There are two issues with that, both I will list as separate reports: 1. Reporting errors to user code 2. Closing connections We use Apache/2.0.52 (Unix) mod_ssl/2.0.52 OpenSSL/0.9.7d mod_jk/1.2.6 and Tomcat 5.0.28 This report is for problem #2 above. For some reason, when we get those broken pipe exceptions as reported in bug #33371 we are having big problems with Apache. What happens is that the scoreboard of Apache fills up with connections (see attached file), all slots show status "W" which means it the request is in work by the handler, in this case mod_jk. This only happens if we go over a certain amount of requests per second, otherwise it does not cause any problems. But when we go over the limit and the slots fill up, what we notice is that we get heaps of "Broken pipe" exceptions on the Tomcat side, where we have 9-10 app servers all running Tomcat, load balanced by mod_jk. All of these servers start to show these Broken pipe exceptions. It is not that we never have those exceptions, of course when a user aborts, then we get that exception in the logs. But at peak time and when the slots fill up in Apache, then we have heaps of them, ie. a couple of them per second. Those errors may be caused because of the too eager calling system closing connection all the time because we respond a bit to slow, but shouldn't those connections be closed all the way, ie. also on the Apache side, in other words the connections between Apache and the user? That is the part that seem to fail. Furthermore we get heaps of stray connections on the Tomcat machines. If you do a "netstat -p -n" you get heaps of connections to local port 8009 (AJP13) but no process is connected to it. I tried to fix this by applying a this patch to Tomcat 5.0.28, file: /jakarta-tomcat-5.0.28-src/jakarta-tomcat-connectors/jk/java/org/apache/jk/common/ChannelSocket.java: 267d266 < final int error=8; 507,513c506 < try { < os.write( buf, 0, len ); < } catch (IOException ex) { < ep.setNote( error, Boolean.TRUE ); < log.warn("Flagging endpoint as in error. " + ep.getNote( socketNote ) + "\nError was: " + ex.getMessage() ); < throw ex; < } --- > os.write( buf, 0, len ); 522,528c515 < try { < os.flush(); < } catch (IOException ex) { < ep.setNote( error, Boolean.TRUE ); < log.warn("Flagging endpoint as in error. " + ep.getNote( socketNote ) + "\nError was: " + ex.getMessage() ); < throw ex; < } --- > os.flush(); 681c668 < log.warn("Closing ajp connection " + status + " - " + ep.getNote( socketNote )); --- > log.warn("Closing ajp connection " + status ); 691,695d677 < break; < } < if ( ep.getNote ( error ) == Boolean.TRUE ) { < log.warn("processConnection: Error reported, closing connection. " + ep.getNote( socketNote )); < ep.setNote( error, null ); What it does is remembering when there was an error during IO and when control comes back all the way through the call stack it eventually closes this connection. This is a problem in itself and as reported in bug #33371, the user code does not get the error, but even if it does it can swallow the exception just like the current code does and therefore the low-level code does not handle the problem at all. My patch makes sure that connections that have caused IO errors to be thrown to close the connection. That took care of our stray connections, and things worked better for a while, but we are getting the problem again. The bottomline seems to be that my patch is useful and should in some way or another be part of the Coyote code, but under heavy load the problem is not solved yet. If you investigate the attached scoreboard dump, you will see how long some of those connections are sitting there, ie. some of the values in the "Seconds Since" (SS) column are greater than 800! And we set all timeouts we could find to a lower value: httpd.conf: Timeout 60 worker.properties: worker.tc1.reply_timeout=120000 worker.properties: worker.tc1.socket_timeout=60 Why do they get stuck? I am pretty sure they are all those many Broken pipe exception connections we get, but since applying my patch I do close them all, why does that not flow back to mod_jk? The biggest problem of all is that mod_jk creates a new connection to the app servers and does not clear out the previous connection to the user. What happens is that when we have those 1000 slots on 3 web servers full, and I do a thread dump on the 9-10 app servers, all 150 configured TP-Processor threads are idle! They all show this state: "TP-Processor183" daemon prio=1 tid=0x5af00018 nid=0x626a runnable [8bfff000..8bfffb58] at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:183) at java.io.BufferedInputStream.read1(BufferedInputStream.java:222) at java.io.BufferedInputStream.read(BufferedInputStream.java:277) - locked <0x4e030028> (a java.io.BufferedInputStream) at org.apache.jk.common.ChannelSocket.read(ChannelSocket.java:611) at org.apache.jk.common.ChannelSocket.receive(ChannelSocket.java:548) at org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:676) at org.apache.jk.common.SocketConnection.runIt(ChannelSocket.java:890) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:683) at java.lang.Thread.run(Thread.java:534) which means they wait for a new request to come in, but this does not happens since all Apache slots are full. To recap: - Connections (User<>Apache) do not get reset or released (quick enough) under heavy load when they should (eg. user closes connection) - Closing connection in Coyote code does only partially help the problem. - mod_jk creates new connection to Tomcat, but old connection to user is not dropped. I am sorry that I cannot assist any further yet, but I keep reading the mod_jk and Tomcat code and see if I can find anything. Until then i wanted to get the issue out in hope that someone else may have an idea. Regards, Lars George, CTO WorldLingo -- Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. --------------------------------------------------------------------- To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org