Return-Path: Delivered-To: apmail-tomcat-dev-archive@www.apache.org Received: (qmail 55462 invoked from network); 19 Nov 2006 23:49:14 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 19 Nov 2006 23:49:14 -0000 Received: (qmail 24435 invoked by uid 500); 19 Nov 2006 23:49:20 -0000 Delivered-To: apmail-tomcat-dev-archive@tomcat.apache.org Received: (qmail 24419 invoked by uid 500); 19 Nov 2006 23:49:20 -0000 Mailing-List: contact dev-help@tomcat.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Tomcat Developers List" Delivered-To: mailing list dev@tomcat.apache.org Received: (qmail 24408 invoked by uid 99); 19 Nov 2006 23:49:20 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 19 Nov 2006 15:49:20 -0800 X-ASF-Spam-Status: No, hits=2.9 required=10.0 tests=HTML_10_20,HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: local policy) Received: from [202.37.75.101] (HELO zeus.orion.co.nz) (202.37.75.101) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 19 Nov 2006 15:49:07 -0800 Received: from mail.orionhealth.com ([172.20.0.2]) by zeus.orion.co.nz (8.12.10/8.12.10) with ESMTP id kAJNo5LF017345 for ; Mon, 20 Nov 2006 12:50:05 +1300 Received: from DOMAKL1-MTA by mail.orionhealth.com with Novell_GroupWise; Mon, 20 Nov 2006 12:48:44 +1300 Message-Id: <4561A427.4A17.00AA.0@orionhealth.com> X-Mailer: Novell GroupWise Internet Agent 7.0.1 Date: Mon, 20 Nov 2006 12:48:39 +1300 From: "Tim Whittington" To: Subject: Problems with managing sizing of processor pools in web server, JK and Tomcat References: <45619CF3020000AA0001F374@mail.orionhealth.com> <4561A427020000AA0001F38C@mail.orionhealth.com> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="=__PartE7C3CF47.0__=" X-Virus-Scanned: ClamAV 0.83/2203/Sat Nov 18 17:54:38 2006 on zeus.orion.co.nz X-Virus-Status: Clean X-Spam-Checker-Version: SpamAssassin 2.55 (1.174.2.19-2003-05-19-exp) X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Status: No, hits=-105.2 required=5.0 tests=BAYES_00,REFERENCES,USER_IN_WHITELIST version=2.55 --=__PartE7C3CF47.0__= Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit A recent discussion on a patch for the IIS ISAPI Redirector (http://issues.apache.org/bugzilla/show_bug.cgi?id=40967) raised some issues with the current way JK handles the sizing of the connection pool and it's relationship to the threads in the web server process. Historically (prior to 1.2.16 or so I think) JK had a cachesize property, and implemented a cache of AJP connections. If a request was made via JK for an endpoint and there were no free endpoints in the cache, then a new endpoint was created and used. This mean that the Web Server -> JK -> Tomcat interaction would work pretty well without much tuning - the only way to break the config was to size the Max Processors on the AJP Connector smaller than the cache size of the JK connector. Around 1.2.16 the cache was changed to a hard limited connection pool, which fails with an error if a thread requests and endpoint and there isn't one available in the pool. This now requires that the following parameters be set to exactly the same values for correct operation: - Web server threads per process - JK connection_pool_size - AJP Connector MaxPoolThreads If the Web server threads per process doesn't match the JK connection_pool_size setting then there is either wastage or breakage: - If there are more web server threads, then there is the potential that the request will fail to obtain an endpoint and generate an error - If there are less web server threads, then the extra connection pool slots are never used If the JK connection_pool_size doesn't match the AJP Connector MaxPoolThreads, then again there is either wastage or breakage: - If there are more JK connection slots, then there could be a service failure when too many concurrent requests are made - If there are less JK connection slots, then the AJP connector will never reach the max processors An illustration of this fragility is the fact that when the cache was changed to a connection pool, the IIS ISAPI Redirector was not fixed. The default cachesize was set to 10, making it very likely that the connection pool would be breached on a server system, and the ExtensionProc was never updated to handle the endpoint allocation failure, so IIS silently drops the TCP connection when this happens (ISAPI requires the ISAPI extension to handle error conditions and send appropriate HTTP responses) I don't know what Apaches behaviour would be, and it's currently masked since the pool size is always set to the correct threads per process in mod_jk. (This was one of several serious bugs I've found recently in the IIS connector introduced by changes to common without updating the IIS code). I submitted a patch for the IIS connector to size the connection pool to the max threads in IIS (I'll discuss the vagaries of how to determine this separately), which raised some comments from Mladen that the default sizes for IIS/Apache are too large compared with their AJP Connector counterparts in Tomcat 5.5/6 and their respective defaults. I see a few ways forward on this: - Keep the current structure, fix the IIS autodetect and document that you should probably not customise connection_pool_size anymore and size the Tomcat AJP Connector appropriately (or at least be very careful) - Make the JK connection pool blocking and reduce the default sizes to match the defaults that the Tomcat AJP Connectors ship with. - nother similar way would be to use a worker thread pool to handle JK requests asynchronously to the web server thread, which is the recommended approach for ISAPI extensions - I don't know how valid this is in Apache though. - Revert the connection pool to a caching behaviour I'd suggest that a worker pool + asynch requests is the most robust option (assuming this is a cosher approach for Apache - it's certainly the best approach for IIS), with a blocking hard-limited connection pool being the second best option. tim --=__PartE7C3CF47.0__=--