tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Schultz <ch...@christopherschultz.net>
Subject Re: CPU high usage, the reason org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run
Date Tue, 26 Nov 2019 18:51:44 GMT
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Mladen,

On 11/26/19 11:13, Mladen Adamović wrote:
> I dig into Tomcat source code and from what I've seen this is what
> happens:
> 
> Tomcat "worker thread", tries the polls the work and if it doesn't
> succeed it invokes sun.misc.Unsafe.park (that's waiting which can
> be interrupted by another thread), internally in Linux goes through
> kernel function FUTEX_WAIT (futex definition is " The futex()
> system call provides a method for waiting until a certain condition
> becomes true."

This is usually the underlying JVM implementation doing all of this. I
don't believe Tomcat calls Unsafe.park() directly. Theoretically,
park() is a blocking call, so even if the thread reports that it is
RUNNABLE it is really (effectively) BLOCKED, although it might be a
slightly busy-wait[1].

> However, these futex_wait functions sums to top CPU load, although
> I don't find it's actually load here, as per infamous Linux change
> of 29 Oct 1993 by Matthias Urlichs: 
> http://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html

Well,
> 
CPU load average doesn't tell the whole story. The real question
is how much work is being done per unit time. If you are observing a
performance *problem* then maybe we can look into that. But a high CPU
load average itself doesn't represent anything worrying IMHO. If you
have sustained high CPU load average, you probably need to be looking
at scaling-out, anyway, but you might not have a performance problem
... yet.

I have a medium-busy service running 2-3 distinct web applications on
two application servers (~1k peak requests per minute in aggregate;
that includes static resources being served by web servers without
hitting Tomcat). The load-average on any given application server is
usually well under the number of CPUs. We are running a 4.9.0 kernel
and I'm not sure if the load-average is being scaled by the number of
CPUs or not, but my 1/5/15 load averages are all under 1.0 right now,
and the 13:00 hour (/now/ in my timezone) is when we start approaching
our peak daily load.

We have a very simple connector configuration which isn't much
different than the default:

    <Connector port="8015"
       redirectPort="443"
           protocol="org.apache.coyote.ajp.AjpNioProtocol"
        URIEncoding="UTF-8"
         packetSize="65536"
           executor="tomcatThreadPool" />

    <Connector port="8017"
           protocol="org.apache.coyote.http11.Http11NioProtocol"
            address="127.0.0.1"
             secure="true"
        URIEncoding="UTF-8"
           executor="tomcatThreadPool" />

We are using NIO (and not e.g. APR) for both connectors. The one with
the most traffic is the AJP connector which maintains a bunch of
(nominally) persistent connections from the small number of web servers.

I have many threads that are blocked on I/O, such as the client poller
for the HTTP connector:

"http-nio-127.0.0.1-8017-ClientPoller-0" #49 daemon prio=5 os_prio=0
tid=0x00007fc3901c4000 nid=0x74f runnable [0x00007fc359325000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
        at
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
        - locked <0x00000000d915c318> (a sun.nio.ch.Util$3)
        - locked <0x00000000d915c308> (a
java.util.Collections$UnmodifiableSet)
        - locked <0x00000000d915c0c0> (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
        at
org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:825)
        at java.lang.Thread.run(Thread.java:748)

Note that the thread state says "runnable" (and "RUNNABLE") but it's
essentially blocked.

The request-processing threads waiting for work are in a similar state:

"catalina-exec-25" #44 daemon prio=5 os_prio=0 tid=0x00007fc3901c2000
nid=0x74a waiting on condition [0x00007fc35982a000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000d915d3a0> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.aw
ait(AbstractQueuedSynchronizer.java:2039)
        at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:4
42)
        at
org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:103)
        at
org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:31)
        at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:
1074)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.jav
a:1134)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja
va:624)
        at
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThrea
d.java:61)
        at java.lang.Thread.run(Thread.java:748)

... although this one says "parking"/"WAITING" which is a more
appropriate description of what's happening.

The above thread states can both be thought of as "asleep".

So perhaps the "problem" isn't so much that you are seeing lots of
threads in these states, than that you have a lot of requests actually
being processed, and they are being processed *quickly*. That's a good
thing. :)

Similar to the way that high memory-usage isn't a bad thing (memory is
there /precisely/ to be used!), a high CPU usage shouldn't necessarily
be considered a big problem.

You never actually said what your definition of "high CPU usage" was.
Are you looking at load-average? %-CPU usage from "top"?

Do you have the option of comparing your CPU usage when you use APR
versus NIO? I would expect your CPU usage to o *up* by switching from
APR to NIO due to two factors:

1. NIO is slightly less CPU-efficient than APR when it comes to
polling and data transfer (per performance testing by myself and
jfclere; see Tomcat's "presentations" page for data and conclusions)

2. NIO uses JSSE which may or may not be using OpenSSL. Ideally,
you'll configure it to use OpenSSL but the pure-Java cryptographic
provider has terrible performance relative to OpenSSL.

Do you have any scope to separate-out TLS termination from the
application server? That is, use a reverse-proxy for TLS termination
and proxy to another server that only handles the application logic?
You may find that TLS is just "expensive" in terms of CPU which, well,
is simply the truth.

(Note that my servers, which not configured to terminate TLS, are
running stunnel as a separate process to proxy those AJP connections
through an encrypted tunnel, so while the configuration does not show
it, my server is still performing all that necessary cryptographic
work on the same set of CPUs as my Tomcat instances are using.)

Hope that helps,
- -chris

[1] https://en.wikipedia.org/wiki/Busy_waiting

> On Tue, Nov 26, 2019 at 4:50 PM Christopher Schultz < 
> chris@christopherschultz.net> wrote:
> 
> Mladen,
> 
> On 11/25/19 14:36, Mladen Adamović wrote:
>>>> On Mon, Nov 25, 2019 at 5:57 PM Christopher Schultz < 
>>>> chris@christopherschultz.net> wrote:
>>>> 
>>>>>> We certainly want to be able to serve 10000 hits per
>>>>>> second (!), while some connections might be stalled.
>>>>> 
>>>>> What might stall a connection? The network, or the
>>>>> application (or database, etc.)?
>>>>> 
>>>> 
>>>> Underlying (synchronized) monitors could stall every thread,
>>>> the network, whatever.
>>>> 
>>>> The network itself demands a large number of connection,
>>>> i.e. current situation at the server (displaying only remove 
>>>> connections):
>>>> 
>>>> root@condor1796 ~ # netstat -tnp | grep -v "127.0.0" | wc -l
>>>> 1220
> 
> Note this is every connection, bound port, and cleanup connection
> the kernel knows about ; not just established/active connections to
> your application specifically.
> 
>>>> If we now have 1220, we definitely need at least 10000
>>>> active connections for Tomcat and I don't see that setting
>>>> this to 50000 is a bad idea.
> 
> Okay. I think you need a reverse proxy and more servers if you
> think 50000 is going to be your peak load.
> 
>>>>> For real DDOS protection, you need a provider who can
>>>>> handle lots of traffic and respond quickly by black-holing
>>>>> that kind of traffic as
>>>> 
>>>> Depending on how large server farm they use (hypothetically).
>>>> We want to be able to survive some DDoS attacks. If we limit
>>>> the number of concurrent connections by IP address and the
>>>> number of connections per second, that's some DoS
>>>> protection.
> 
> But honestly, this is better done at another layer of the network;
> not at the host-level.
> 
>>>> Regarding network delays, out of currently 1220 active
>>>> remove connections, most of them are in TIME_WAIT state.
>>>> Lowering TIME_WAIT settings in Linux are not recommended.
> 
> Hmm. Lots of TIME_WAIT connections isn't good. I actually don't
> know if they count "against" your 50000 limit in the Java process.
> 
> -chris
>> 
>> ---------------------------------------------------------------------
>>
>> 
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
>> For additional commands, e-mail: users-help@tomcat.apache.org
>> 
>> 
> 
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl3ddD8ACgkQHPApP6U8
pFjTJBAAuWpntlG9XFNS90GpJc3SWWeOPFQmJN8FTrQNPhJ49sPlk6aOgTbujGwS
eHxQI2NIpgeP0MDqEKWEotOfxn+wwxUMJKOKnuxyxbDU8CGytJ4UwBJ5CddnsA5T
QaIbANdoGI2+K+9v5jjlbv97DK2Vz/dh92v7QaKdJjND/ql61i7g/ZfBnJJmSZSE
ScwVlexuYdG+izy2Vr1PX2lSltMeI+7Dth5JkyhHFVbw1wGF9qZsQ4rsszRKO0ZB
jPrCK2VmHNUcYQNG1q0Gi9bzAUI67fHoaJjmRIU3A8PtoFMehIomKn8HkgBrc9aQ
kmtb7BPxD63VcTK2rVGuMfa5y70AWB2hPcvUtKAO7CBC7LyC9/ux2jZqNTxMVUH6
wkxIkeQklLYpSDeI0E2xwxiH4OPakP2kZABp2zXH5JyfRQljlnbchWg/gT3DrCck
lDt0ZmZPEfz792Pw8K/vJ4ZZre2BuQXRZhL3XvQUyWMHkHO3XTuWsJ2beaXzbo8E
qFrrU0iXdErC6TT00V75t3MUQWto3Zrvb7Y/n8k3rh4X3pfblUQw6z1mojZui6Ik
XZ4qrWkR9unxYHlMuaYOg3e9Ug67UHgUVM+Vvj3tlI81nJrDw8ikbVDHJ2R6R5Ft
SqZoM7i+Y0i02jH0hpNAukFlw3Vdig1YmoPciAvCJcbZkJDU96E=
=bUtP
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message