From users-return-268743-archive-asf-public=cust-asf.ponee.io@tomcat.apache.org Wed Nov 13 15:38:15 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 0234D180621 for ; Wed, 13 Nov 2019 16:38:14 +0100 (CET) Received: (qmail 33701 invoked by uid 500); 13 Nov 2019 15:38:10 -0000 Mailing-List: contact users-help@tomcat.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Tomcat Users List" Delivered-To: mailing list users@tomcat.apache.org Received: (qmail 33691 invoked by uid 99); 13 Nov 2019 15:38:10 -0000 Received: from Unknown (HELO mailrelay1-lw-us.apache.org) (10.10.3.42) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Nov 2019 15:38:10 +0000 Received: from [192.168.23.12] (host213-121-9-136.range213-121.btcentralplus.com [213.121.9.136]) by mailrelay1-lw-us.apache.org (ASF Mail Server at mailrelay1-lw-us.apache.org) with ESMTPSA id 54AEA1003 for ; Wed, 13 Nov 2019 15:38:10 +0000 (UTC) Subject: Re: tomcat thread incurring CPU load To: users@tomcat.apache.org References: <0F7034A9-A36D-4782-A3EC-48F58F4A5B72@apache.org> From: Mark Thomas Openpgp: preference=signencrypt Autocrypt: addr=markt@apache.org; prefer-encrypt=mutual; keydata= mQINBEq0DukBEAD4jovHOPJDxoD+JnO1Go2kiwpgRULasGlrVKuSUdP6wzcaqWmXpqtOJKKw W2MQFQLmg7nQ9RjJwy3QCbKNDJQA/bwbQT1F7WzTCz2S6vxC4zxKck4t6RZBq2dJsYKF0CEh 6ZfY4dmKvhq+3istSoFRdHYoOPGWZpuRDqfZPdGm/m335/6KGH59oysn1NE7a2a+kZzjBSEg v23+l4Z1Rg7+fpz1JcdHSdC2Z+ZRxML25eVatRVz4yvDOZItqDURP24zWOodxgboldV6Y88C 3v/7KRR+1vklzkuA2FqF8Q4r/2f0su7MUVviQcy29y/RlLSDTTYoVlCZ1ni14qFU7Hpw43KJ tgXmcUwq31T1+SlXdYjNJ1aFkUi8BjCHDcSgE/IReKUanjHzm4XSymKDTeqqzidi4k6PDD4j yHb8k8vxi6qT6Udnlcfo5NBkkUT1TauhEy8ktHhbl9k60BvvMBP9l6cURiJg1WS77egI4P/8 2oPbzzFiGFqXyJKULVgxtdQ3JikCpodp3f1fh6PlYZwkW4xCJLJucJ5MiQp07HAkMVW5w+k8 Xvuk4i5quh3N+2kzKHOOiQCDmN0sz0XjOE+7XBvM1lvz3+UarLfgSVmW8aheLd7eaIl5ItBk 8844ZJ60LrQ+JiIqvqJemxyIM6epoZvY5a3ZshZpcLilC5hW8QARAQABtCJNYXJrIEUgRCBU aG9tYXMgPG1hcmt0QGFwYWNoZS5vcmc+iQI3BBMBCgAhBQJKtA7pAhsDBQsJCAcDBRUKCQgL BRYCAwEAAh4BAheAAAoJEBDAHFovYFnn2YgQAKN6FLG/I1Ij3PUlC/XNlhasQxPeE3w2Ovtt weOQPYkblJ9nHtGH5pNqG2/qoGShlpI04jJy9GxWKOo7NV4v7M0mbVlCXVgjdlvMFWdL7lno cggwJAFejQcYlVtxyhu4m50LBvBunEhxCbQcKnnWmkB7Ocm0Ictaqjc9rCc1F/aNhVMUpJ0z G1kyTp9hxvN6TbCQlacMx5ocTWzL0zn6QZhbUfrYwfxYJmSnkVYZOYzXIXIsLN5sJ9Q4P8tj Y4qWgd+bQvOqPWrkzL9LVRnGOrSYIsoM5zWdoj1g1glMzK/ZqJdRqqqBhe6FYTbXipz8oX8i mCebcaxZnfLhGiqqX+yDa3YUwDiqom+sZOc0iXGvKkqltPLpNeF0MVT7aZjalsQ/v2Ysb24R Ql9FfjfWmvT8ZPWz8Kore1AI4UcIIgFVtM+zuLlL9CIsGjg+gHDE2dhZDY0qfizlHL9CoAWU DM3pIfxM2V4BRn1xO+j/mModhjmYLZvnFVz4KGkNO7wRkofAANIWYo3WI5x83BGDH371t3NR rrpSSFP0XpQX6/Leaj2j6U6puABL2qBxhscsO6chc3u4/+019ff+peZVsc9ttcTQXsKIujmM b8p2sk5usmv6PKVX3oW/RAxpbVHU5kZ5px1Hq7mMQdZfLs5ff4YymXBH02z4/RmSzPam0Xb5 uQINBEq0DukBEADCNEkws5YroBmbu8789Xf006gTl5LzD/Hdt3sAp9iCfPgucO+l7U+xbo1X HTMJQwEVfS+Rx3RbaLYRG+hU7FuJLQB/5NaCDNRuqw5KHyQtJUH+zo84IqqfMzG8aOSdHg1y r2xKH4QTmgQONBu/W0xEZmZro6TjYNwkk2pwXK2yuImZPUOy+mK1qF8Wm3hTtkPE+FFSNFIa eHDoTGmx/0Riu/K7dNJTrC0TlRpn2K6d60zB53YYTc+0DYSDyB0FupXiAx/+XEGn3Q7eNi2B V6w50v5r51QP8zptiFflMfFKNAfV8xS5MteQd98YS5qqd/LPo3gS5HFPQaSL0k3RTClv7fQN HcZFqmv0OWpix6zm2npYxhqsTDGeSa52/uXehVXF5JubYFifMSLpbGVZqdrmG5hr2cycxsjF iY0zJOaRitmN/JWbOGLiwrcN4ukKNyFntFG5jPaFnJdx9rHfyJNeF9cgv9JlZeFxJ6WqIAhl KOuH3K8/py0SPE6ZOFfRo0YUxvh25K/siOcPLm613aOxyY7YfQ8ME2vgn7I0mAtg9am+YFDa bGqj839odwZdzZv2T2mUHnybFTJFBuMWGWKYstYDS6eZEmhupbPvUKkDug/mO+gdo+pSKF9Y S6DM5RtCdTNJq4NZY50ypBb5RSj+INHPocIp2V/DDTbzySsu6wARAQABiQIfBBgBCgAJBQJK tA7pAhsMAAoJEBDAHFovYFnnLe0P/i34oK5cE2LlqUEITEcTO94x1EX0UmtKokRfQ3AYWK8X eFD8cmSty72hMkL+1c0V//4Qc53SUyLIWXk8FKWF7hdL3zyuBqlRb55721CYC35GA/jR90p0 k1vr701gaat2cNTOVC0/6H9cE5yYXT+zMr9TSiKCDwONhhSbmAJZc6X0fgsmCD7I5xUI5Vri hN/Wx0CZBtrXGUyE4hgFaYSGptZmkY5Ln1e+nI185Bda7bpLwcAIGrI9nYtVXgf71ybGKdPP tFfXIoPXuctn99M7NnWBhNuGDms2YWkOC7eeWBTxKkZDWR3vRmRy52B6GxR7USk/KXs7yqGP kfT/c4CZFfOurZUXXuC3PvOme0DQmqwExtJormoG4Fy6suEFPrfhYMigTy7kSbVTCOBMjQLH +U/FFNshvg9+M/ZvaKT+0lpRvBSuG5ngsC0bO0xWsXhb6qfH2h53g4VcwFvCBL5IfqgAeUbC nGGHNcGWpmwdeb7D7ahrNZSHEUUYR7lTbjkYS01/QDOcEwNZOqDRIJUQOOUq35721VeROkdh ZmMZtFlsQeQJsWoqGrQo/kEYicVlMVOgjmOOzOa5fRb/IqlGlBn4a4me3hWthLLtMy+OOEim 6ENjntVTBQiTP/YqrxWDbCkaD7b2e9wY5N3JlRxMIQHfcHaND3PRdQSn7oHYXmJl Message-ID: <57d65705-9eef-5644-4e12-a949eec57871@apache.org> Date: Wed, 13 Nov 2019 15:38:09 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-GB Content-Transfer-Encoding: 7bit On 12/11/2019 19:11, M. Manna wrote: > HI Mark, > > following my previous reply, we have now confirmed that it's indeed 8.5.45 > with APR 1.2.23 that's causing such high JVM CPU usage. > We used took out 2 out of 50 servers from the load balancer config, > reverted tomcat, and redeployed. With near to identical user traffic, the > two servers are responding normally without/without traffic with 8.5.41. > The JVM dump looks a lot better with 8.5.41. > > We do think that the recent changes in APR and some other tomcat jar may > have caused compatibility issue on Windows server 2016 (64-bit) platform. > But unfortunately, we cannot pinpoint exactly what change may have caused > this (i.e. actual OS vs Security Updates). With this in mind, we are also > being wary to move to 8.5.47 as we don't know if the same issue will occur > again. Since 8.5.41 has been packaged with previously accepted application > installer, we are more comfortable rolling back. Just to confirm, you see this high CPU usage with a clean install (no additional web applications deployed, no configuration changes) on Windows 2016 DataCenter (64-bit)? If this is the case, it should be fairly easy to reproduce. Mark > > I would appreciate if this can be looked into. > > On Tue, 12 Nov 2019 at 11:27, M. Manna wrote: > >> Hey Mark (appreciate your response in US holiday time) >> >> On Tue, 12 Nov 2019 at 07:51, Mark Thomas wrote: >> >>> On November 12, 2019 12:54:53 AM UTC, "M. Manna" >>> wrote: >>>> Just to give an update again: >>>> >>>> 1) We reverted the APR to 1.2.21 - but observed no difference. >>>> 2) We took 3 thread dumps over 1 min interval (without any user >>>> sessions) - >>>> All threads are tomcat's internal pool threads. >>>> >>>> When we checked the thread details (using fasthread.io) - we didn't see >>>> any >>>> of our application stack. Since there is no user traffic, this is >>>> coming >>> >from tomcat internally. At this stage, we cannot really figure out >>>> what's >>>> the root cause. >>>> >>>> Any help is appreciated. >>> >>> Migrated from what (full version info please)? >>> >> from 8.5.41 to 8.5.45 (we migrate 3 times a year, last was in June) >> >>> >>> Operating system exact version? >>> >> Microsoft Windows Server 2016 DataCentre (64-bit) >> >>> >>> JRE vendor and exact version? >>> >> C:\jdk1.8.0\bin>java.exe -version >> java version "1.8.0_162" >> Java(TM) SE Runtime Environment (build 1.8.0_162-b12) >> Java HotSpot(TM) 64-Bit Server VM (build 25.162-b12, mixed mode) >> >> >>> Do you see the same behavior with the latest 8.5.x and latest Tomcat >>> Native? >>> >> We are using APR 1.2.23 which I can also see in latest tomcat. Due to >> production due diligence we cannot roll to a different version that easily. >> Normally, we lag behind by 2 monthly releases of tomcat. We also reverted >> the APR to 1.2.21 (but no difference). >> >>> >>> What triggers this behaviour? >>> >> That is quite strange. Due to US holidays, we had a low traffic on our >> servers, and nothing has crept in to suggest that it's application-driven. >> We took one tomcat instance out of 50 instances and removed all user >> sessions (i.e. no application activities or threads). Upon restart of >> tomcat, the CPU spike lingered past the initial servlet startup period. We >> monitored that over 1-2 hours but there was no difference. >> >>> >>> How often do you see this behaviour? >>> >> We took 2 sets of data >> 1) 3 Jstack dump based on 10 seconds interval. >> 2) 3 jstack dump based on 1 min interval. >> >> Both the above reveals that all background threads (http, pool etc.) were >> from tomcat. We didn't have any application threads lingered in those 3 >> samples. So yes we see this almost all the time if we take samples. >> However, when we compared with pre-production instances (with Windows >> server R2 x64 bit), we don't see such abnormal spike. In fact, the >> application instance doesn't incur such a big CPU spike. Whilst composing >> this email, I am now thinking if the APR is indeed incompatible with >> WIndows Server R2 (or the presence of any Windows Updates) which blocks the >> native poll() call longer than usual. >> >> An example is that on Windows Server 2012 - APR poll() call takes about >> 30% CPU time - but with Windows Server 2016 it's almost always 95%. >> >> >>> >>> And anything else you think might be relevant. >>> >> >> We are using end-2-end encryption using APR (with Certificate and >> SSLConfig resource setup in server.xml). But it's survived past 3 tomcat >> upgrades without any issue. >> Except OS we don't have any obvious culprit identified at the moment. >> >> Thanks, >> >>> >>> Mark >>> >>>> >>>> Thanks, >>>> >>>> On Mon, 11 Nov 2019 at 20:57, M. Manna wrote: >>>> >>>>> Hello All, >>>>> >>>>> Any thoughts regarding this? Slightly clueless at this point, so any >>>>> direction will be appreciated. >>>>> >>>>> We are seeing the poll taking all the CPU time. We are using >>>>> OperatingSystemMXBean.getProcessCpuLoad() and >>>>> OperatingSystemMXBean.getSystemCpuLoad() to get our metrics (then >>>> x100 to >>>>> get the pct). >>>>> >>>>> Thanks, >>>>> >>>>> >>>>> On Mon, 11 Nov 2019 at 17:46, M. Manna wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> after migrating to 8.5.45, we are seeing a lot of cpu load by >>>> following >>>>>> JVM thread dump: >>>>>> >>>>>> "https-openssl-apr-0.0.0.0-8443-Poller" : 102 : RUNNABLE : >>>>>> cpu=172902703125000 : cpuLoad= 74.181015 >>>>>> >>>>>> BlockedCount:8464 BlockedTime:0 LockName:null LockOwnerID:-1 >>>>>> LockOwnerName:null >>>>>> >>>>>> WaitedCount:5397 WaitedTime:0 InNative:false IsSuspended:false at >>>>>> org.apache.tomcat.jni.Poll.poll(Poll.java:-2) >>>>>> >>>>>> at >>>>>> >>>> org.apache.tomcat.util.net.AprEndpoint$Poller.run(AprEndpoint.java:1547) >>>>>> >>>>>> at java.lang.Thread.run(Thread.java:748) >>>>>> >>>>>> >>>>>> These are coming after 2-3 successful jvm dump. Is this something >>>>>> familiar to anybody? >>>>>> >>>>>> Thanks, >>>>>> >>>>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org >>> For additional commands, e-mail: users-help@tomcat.apache.org >>> >>> > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org For additional commands, e-mail: users-help@tomcat.apache.org