From users-return-268744-archive-asf-public=cust-asf.ponee.io@tomcat.apache.org Wed Nov 13 16:21:20 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 7CA9E180621 for ; Wed, 13 Nov 2019 17:21:20 +0100 (CET) Received: (qmail 29636 invoked by uid 500); 13 Nov 2019 16:21:16 -0000 Mailing-List: contact users-help@tomcat.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Tomcat Users List" Delivered-To: mailing list users@tomcat.apache.org Received: (qmail 29625 invoked by uid 99); 13 Nov 2019 16:21:16 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Nov 2019 16:21:16 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 92019C056B for ; Wed, 13 Nov 2019 16:21:15 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.001 X-Spam-Level: X-Spam-Status: No, score=0.001 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id P-K1vwXzQ99j for ; Wed, 13 Nov 2019 16:21:11 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2607:f8b0:4864:20::133; helo=mail-il1-x133.google.com; envelope-from=manmedia@gmail.com; receiver= Received: from mail-il1-x133.google.com (mail-il1-x133.google.com [IPv6:2607:f8b0:4864:20::133]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id A85A67DD5E for ; Wed, 13 Nov 2019 16:21:10 +0000 (UTC) Received: by mail-il1-x133.google.com with SMTP id a7so2309026ild.6 for ; Wed, 13 Nov 2019 08:21:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=fxYVonuYxnsJuqmeL63nfqSbWeodD1i6svDAl+Y/W50=; b=gMUCP0NL2q++ozQqK5PfEz00Y7sLdGj3kMClsldvl3P2/p4Xh6zEBo9Dphnd9SnDw1 9D5OmYtrJTO0GvbRrqm3ev0KYNTGXH3xMY1dc+0eEhucR/jQcQzt5ih0JdPYbf2llx2l xo4lNL2nQG43aOPbhMpvgjWnvuWP8t5rxWOY3HZoKPs12Pgh4OJeVq2MaJsKu3pMyMuD jA4ASjmJlC4HySzmBq94FaxLIWR2B49Vfltjzy8Er50ePL0NFN/RSzw92Jy2MMRrDQv5 Q46BxHK1bPvIxEqLMzcWxCXIZ4eulO3WbhiRhyiXmDmiFIu9PFXbYPJ/1aj782H/6KB3 QTag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=fxYVonuYxnsJuqmeL63nfqSbWeodD1i6svDAl+Y/W50=; b=HeKHIE+lbZu3Nh8aGHsa06piyYRSccnsW0/F3Tzo6DRico5JBA/Lh/vbCtYFJ7OBDi oW+UqUzYesHFvlXJJx0lBEBJ8Cf/H2ig04N3nPd7ElDvkvhObTt/nhi/VjFOmEwATamB VhozCO5dRLyFfr4eHrGFjtCzYtU29wr6MSCg4WLu2fbCpfGlyCisLAmw377TtoHrVXVm 9IW4FxQA//cRjZlL5D0QxLFMy66I0DbgwSIQFPYzIJMxgJ6LPdxZFt3VViTwVdH34YNt 3LI+Jb3PhuenJbJEJBpmQh98TqGhDKKGLd7QH43ySbVsmZ04/pO72pNbjtiRy/k1llYV LLhg== X-Gm-Message-State: APjAAAWapC/ivJ/zJctOtLW9ekXJBpP49myl0yvdmqKHZ/TW+2C0Bae0 KWQVXAa3NrWnOVPDPYaEinmVBqmW5881IXwwE3lCh6tc7Hg= X-Google-Smtp-Source: APXvYqxpVE26KUYQRuo9NUJn8s1SN7opg6YmLEU8TLTZ0SbYLZQ7H6yG5z+TyNkQbdwSYW7aEY4OBMcNT5BH60XyFXs= X-Received: by 2002:a92:7e4a:: with SMTP id z71mr5092924ilc.141.1573662069044; Wed, 13 Nov 2019 08:21:09 -0800 (PST) MIME-Version: 1.0 References: <0F7034A9-A36D-4782-A3EC-48F58F4A5B72@apache.org> <57d65705-9eef-5644-4e12-a949eec57871@apache.org> In-Reply-To: <57d65705-9eef-5644-4e12-a949eec57871@apache.org> From: "M. Manna" Date: Wed, 13 Nov 2019 16:20:58 +0000 Message-ID: Subject: Re: tomcat thread incurring CPU load To: Tomcat Users List Content-Type: multipart/alternative; boundary="0000000000009f3a6e05973cc03b" --0000000000009f3a6e05973cc03b Content-Type: text/plain; charset="UTF-8" HI Mark, On Wed, 13 Nov 2019 at 15:38, Mark Thomas wrote: > On 12/11/2019 19:11, M. Manna wrote: > > HI Mark, > > > > following my previous reply, we have now confirmed that it's indeed > 8.5.45 > > with APR 1.2.23 that's causing such high JVM CPU usage. > > We used took out 2 out of 50 servers from the load balancer config, > > reverted tomcat, and redeployed. With near to identical user traffic, the > > two servers are responding normally without/without traffic with 8.5.41. > > The JVM dump looks a lot better with 8.5.41. > > > > We do think that the recent changes in APR and some other tomcat jar may > > have caused compatibility issue on Windows server 2016 (64-bit) platform. > > But unfortunately, we cannot pinpoint exactly what change may have caused > > this (i.e. actual OS vs Security Updates). With this in mind, we are also > > being wary to move to 8.5.47 as we don't know if the same issue will > occur > > again. Since 8.5.41 has been packaged with previously accepted > application > > installer, we are more comfortable rolling back. > > Just to confirm, you see this high CPU usage with a clean install (no > additional web applications deployed, no configuration changes) on > Windows 2016 DataCenter (64-bit)? > > If this is the case, it should be fairly easy to reproduce. > > Mark > > We do not deploy multiple applications. In fact, Under tomcat webapps/ROOT we only have one application (ours). Each tomcat instance is hosted on a VM (total 50) and all of them are identically configured (server.xml, web.xml, logging, CPU/RAM). We have not made any other configuration change between 8.5.41 and 8.5.45. And yes, I agree with you that it's fairly easy to reproduce. Thanks, > > > > > I would appreciate if this can be looked into. > > > > On Tue, 12 Nov 2019 at 11:27, M. Manna wrote: > > > >> Hey Mark (appreciate your response in US holiday time) > >> > >> On Tue, 12 Nov 2019 at 07:51, Mark Thomas wrote: > >> > >>> On November 12, 2019 12:54:53 AM UTC, "M. Manna" > >>> wrote: > >>>> Just to give an update again: > >>>> > >>>> 1) We reverted the APR to 1.2.21 - but observed no difference. > >>>> 2) We took 3 thread dumps over 1 min interval (without any user > >>>> sessions) - > >>>> All threads are tomcat's internal pool threads. > >>>> > >>>> When we checked the thread details (using fasthread.io) - we didn't > see > >>>> any > >>>> of our application stack. Since there is no user traffic, this is > >>>> coming > >>> >from tomcat internally. At this stage, we cannot really figure out > >>>> what's > >>>> the root cause. > >>>> > >>>> Any help is appreciated. > >>> > >>> Migrated from what (full version info please)? > >>> > >> from 8.5.41 to 8.5.45 (we migrate 3 times a year, last was in June) > >> > >>> > >>> Operating system exact version? > >>> > >> Microsoft Windows Server 2016 DataCentre (64-bit) > >> > >>> > >>> JRE vendor and exact version? > >>> > >> C:\jdk1.8.0\bin>java.exe -version > >> java version "1.8.0_162" > >> Java(TM) SE Runtime Environment (build 1.8.0_162-b12) > >> Java HotSpot(TM) 64-Bit Server VM (build 25.162-b12, mixed mode) > >> > >> > >>> Do you see the same behavior with the latest 8.5.x and latest Tomcat > >>> Native? > >>> > >> We are using APR 1.2.23 which I can also see in latest tomcat. Due to > >> production due diligence we cannot roll to a different version that > easily. > >> Normally, we lag behind by 2 monthly releases of tomcat. We also > reverted > >> the APR to 1.2.21 (but no difference). > >> > >>> > >>> What triggers this behaviour? > >>> > >> That is quite strange. Due to US holidays, we had a low traffic on our > >> servers, and nothing has crept in to suggest that it's > application-driven. > >> We took one tomcat instance out of 50 instances and removed all user > >> sessions (i.e. no application activities or threads). Upon restart of > >> tomcat, the CPU spike lingered past the initial servlet startup period. > We > >> monitored that over 1-2 hours but there was no difference. > >> > >>> > >>> How often do you see this behaviour? > >>> > >> We took 2 sets of data > >> 1) 3 Jstack dump based on 10 seconds interval. > >> 2) 3 jstack dump based on 1 min interval. > >> > >> Both the above reveals that all background threads (http, pool etc.) > were > >> from tomcat. We didn't have any application threads lingered in those 3 > >> samples. So yes we see this almost all the time if we take samples. > >> However, when we compared with pre-production instances (with Windows > >> server R2 x64 bit), we don't see such abnormal spike. In fact, the > >> application instance doesn't incur such a big CPU spike. Whilst > composing > >> this email, I am now thinking if the APR is indeed incompatible with > >> WIndows Server R2 (or the presence of any Windows Updates) which blocks > the > >> native poll() call longer than usual. > >> > >> An example is that on Windows Server 2012 - APR poll() call takes about > >> 30% CPU time - but with Windows Server 2016 it's almost always 95%. > >> > >> > >>> > >>> And anything else you think might be relevant. > >>> > >> > >> We are using end-2-end encryption using APR (with Certificate and > >> SSLConfig resource setup in server.xml). But it's survived past 3 tomcat > >> upgrades without any issue. > >> Except OS we don't have any obvious culprit identified at the moment. > >> > >> Thanks, > >> > >>> > >>> Mark > >>> > >>>> > >>>> Thanks, > >>>> > >>>> On Mon, 11 Nov 2019 at 20:57, M. Manna wrote: > >>>> > >>>>> Hello All, > >>>>> > >>>>> Any thoughts regarding this? Slightly clueless at this point, so any > >>>>> direction will be appreciated. > >>>>> > >>>>> We are seeing the poll taking all the CPU time. We are using > >>>>> OperatingSystemMXBean.getProcessCpuLoad() and > >>>>> OperatingSystemMXBean.getSystemCpuLoad() to get our metrics (then > >>>> x100 to > >>>>> get the pct). > >>>>> > >>>>> Thanks, > >>>>> > >>>>> > >>>>> On Mon, 11 Nov 2019 at 17:46, M. Manna wrote: > >>>>> > >>>>>> Hello, > >>>>>> > >>>>>> after migrating to 8.5.45, we are seeing a lot of cpu load by > >>>> following > >>>>>> JVM thread dump: > >>>>>> > >>>>>> "https-openssl-apr-0.0.0.0-8443-Poller" : 102 : RUNNABLE : > >>>>>> cpu=172902703125000 : cpuLoad= 74.181015 > >>>>>> > >>>>>> BlockedCount:8464 BlockedTime:0 LockName:null LockOwnerID:-1 > >>>>>> LockOwnerName:null > >>>>>> > >>>>>> WaitedCount:5397 WaitedTime:0 InNative:false IsSuspended:false at > >>>>>> org.apache.tomcat.jni.Poll.poll(Poll.java:-2) > >>>>>> > >>>>>> at > >>>>>> > >>>> org.apache.tomcat.util.net > .AprEndpoint$Poller.run(AprEndpoint.java:1547) > >>>>>> > >>>>>> at java.lang.Thread.run(Thread.java:748) > >>>>>> > >>>>>> > >>>>>> These are coming after 2-3 successful jvm dump. Is this something > >>>>>> familiar to anybody? > >>>>>> > >>>>>> Thanks, > >>>>>> > >>>>> > >>> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org > >>> For additional commands, e-mail: users-help@tomcat.apache.org > >>> > >>> > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org > For additional commands, e-mail: users-help@tomcat.apache.org > > --0000000000009f3a6e05973cc03b--