tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier ...@ice-sa.com>
Subject Re: Connection count explosion due to thread http-nio-80-ClientPoller-x death
Date Thu, 26 Jun 2014 15:09:38 GMT
Lars Engholm Johansen wrote:
> Thanks for all the replies guys.
> 
> Have you observed a performance increase by setting
>> acceptorThreadCount to 4 instead of a lower number? I'm just curious.
> 
> 
> No, but this was the consensus after elongated discussions in my team. We
> have 12 cpu cores - better save than sorry. I know that the official docs
> reads "although you would never really need more than 2" :-)
> 
> The GC that Andre suggested was to get rid of some of CLOSE_WAIT
>> connections in netstat output, in case if those are owned by some
>> abandoned and non properly closed I/O classes that are still present
>> in JVM memory.
> 
> 
> Please check out the "open connections" graph at http://imgur.com/s4fOUte
> As far as I interpret, we only have a slight connection count growth during
> the days until the poller thread die. These may or may not disappear by
> forcing a GC, but the amount is not problematic until we hit the
> http-nio-80-ClientPoller-x
> thread death.

Just to make sure : what kind of connections does this graph actually show ? in which TCP

state ? does it count only the "established", or also the "FIN_WAIT", "CLOSE_WAIT", 
"LISTEN" etc.. ?

> 
> The insidious part is that everything may look fine for a long time (apart
>> from an occasional long list of CLOSE_WAIT connections).  A GC will happen
>> from time to time (*), which will get rid of these connections.  And those
>> CLOSE_WAIT connections do not consume a lot of resources, so you'll never
>> notice.
>> Until at some point, the number of these CLOSE_WAIT connections gets just
>> at the point where the OS can't swallow any more of them, and then you have
>> a big problem.
>> (*) and this is the "insidious squared" part : the smaller the Heap, the
>> more often a GC will happen, so the sooner these CLOSE_WAIT connections
>> will disappear.  Conversely, by increasing the Heap size, you leave more
>> time between GCs, and make the problem more likely to happen.
> 
> 
> You are correct. The bigger the Heap size the rarer a GC will happen - and
> we have set aside 32GiB of ram. But again, referring to my "connection
> count" graph, a missing close in the code does not seem to be the culprit.
> 
> A critical error (java.lang.ThreadDeath,
>> java.lang.VirtualMachineError) will cause death of a thread.
>> A subtype of the latter is java.lang.OutOfMemoryError.
> 
> 
> I just realized that StackOverflowError is also a subclass of
> VirtualMachineError,
> and remembered that we due to company historical reasons had configured the
> JVM stack size to 256KiB (down from the default 1GiB on 64 bit machines).
> This was to support a huge number of threads on limited memory in the past.
> I have now removed the -Xss jvm parameter and are exited if this solves our
> poller thread problems.
> Thanks for the hint, Konstantin.
> 
> I promise to report back to you guys :-)
> 
> 
> 
> On Fri, Jun 20, 2014 at 2:49 AM, Filip Hanik <filip@hanik.com> wrote:
> 
>> "Our sites still functions normally with no cpu spikes during this build up
>> until around 60,000 connections, but then the server refuses further
>> connections and a manual Tomcat restart is required."
>>
>> yes, the connection limit is a 16 bit short count minus some reserved
>> addresses. So your system should become unresponsive, you've run out of
>> ports (the 16 bit value in a TCP connection).
>>
>> netstat -na should give you your connection state when this happens, and
>> that is helpful debug information.
>>
>> Filip
>>
>>
>>
>>
>> On Thu, Jun 19, 2014 at 2:44 PM, André Warnier <aw@ice-sa.com> wrote:
>>
>>> Konstantin Kolinko wrote:
>>>
>>>> 2014-06-19 17:10 GMT+04:00 Lars Engholm Johansen <larsjo@gmail.com>:
>>>>
>>>>> I will try to force a GC next time I am at the console about to
>> restart a
>>>>> Tomcat where one of the http-nio-80-ClientPoller-x threads have died
>> and
>>>>> connection count is exploding.
>>>>>
>>>>> But I do not see this as a solution - can you somehow deduct why this
>>>>> thread died from the outcome from a GC?
>>>>>
>>>> Nobody said that a thread died because of GC.
>>>>
>>>> The GC that Andre suggested was to get rid of some of CLOSE_WAIT
>>>> connections in netstat output, in case if those are owned by some
>>>> abandoned and non properly closed I/O classes that are still present
>>>> in JVM memory.
>>>>
>>> Exactly, thanks Konstantin for clarifying.
>>>
>>> I was going per the following in the original post :
>>>
>>> "Our sites still functions normally with no cpu spikes during this build
>> up
>>> until around 60,000 connections, but then the server refuses further
>>> connections and a manual Tomcat restart is required."
>>>
>>> CLOSE_WAIT is a normal state for a TCP connection, but it should not
>>> normally last long.
>>> It indicates basically that the other side has closed the connection, and
>>> that this side should do the same. But it doesn't, and as long as it
>>> doesn't the connection remains in the CLOSE_WAIT state.  It's like
>>> "half-closed", but not entirely, and as long as it isn't, the OS cannot
>> get
>>> rid of it.
>>> For a more precise explanation, Google for "TCP CLOSE_WAIT state".
>>>
>>> I have noticed in the past, with some Linux versions, that when the
>> number
>>> of such CLOSE_WAIT connections goes above a certain level (several
>>> hundred), the TCP/IP stack can become totally unresponsive and not accept
>>> any new connections at all, on any port.
>>> In my case, this was due to the following kind of scenario :
>>> Some class Xconnection instantiates an object, and upon creation this
>>> object opens a TCP connection to something. This object is now used as an
>>> "alias" for this connection.  Time passes, and finally the object goes
>> out
>>> of scope (e.g. the reference to it is set to "null"), and one may believe
>>> that the underlying connection gets closed as a side-effect.  But it
>>> doesn't, not as long as this object is not actually garbage-collected,
>>> which triggers the actual object destruction and the closing of the
>>> underlying connection.
>>> Forcing a GC is a way to provoke this (and restarting Tomcat another, but
>>> more drastic).
>>>
>>> If a forced GC gets rid of your many CLOSE_WAIT connections and makes
>> your
>>> Tomcat operative again, that would be a sign that something similar to
>> the
>>> above is occurring; and then you would need to look in your application
>> for
>>> the oversight. (e.g. the class should have a "close" method (closing the
>>> underlying connection), which should be invoked before letting the object
>>> go out of scope).
>>>
>>> The insidious part is that everything may look fine for a long time
>> (apart
>>> from an occasional long list of CLOSE_WAIT connections).  A GC will
>> happen
>>> from time to time (*), which will get rid of these connections.  And
>> those
>>> CLOSE_WAIT connections do not consume a lot of resources, so you'll never
>>> notice.
>>> Until at some point, the number of these CLOSE_WAIT connections gets just
>>> at the point where the OS can't swallow any more of them, and then you
>> have
>>> a big problem.
>>>
>>> That sounds a bit like your case, doesn't it ?
>>>
>>> (*) and this is the "insidious squared" part : the smaller the Heap, the
>>> more often a GC will happen, so the sooner these CLOSE_WAIT connections
>>> will disappear.  Conversely, by increasing the Heap size, you leave more
>>> time between GCs, and make the problem more likely to happen.
>>>
>>>
>>> I believe that the rest below may be either a consequence, or a red
>>> herring, and I would first eliminate the above as a cause.
>>>
>>>
>>>
>>>>  And could an Exception/Error in Tomcat thread
>>  http-nio-80-ClientPoller-0
>>>>>  or  http-nio-80-ClientPoller-1  make the thread die with no Stacktrace
>>>>> in
>>>>> the Tomcat logs?
>>>>>
>>>>>
>>>> A critical error (java.lang.ThreadDeath,
>>>> java.lang.VirtualMachineError) will cause death of a thread.
>>>>
>>>> A subtype of the latter is java.lang.OutOfMemoryError.
>>>>
>>>> As of now, such errors are passed through and are not logged by
>>>> Tomcat, but are logged by java.lang.ThreadGroup.uncaughtException().
>>>> ThreadGroup prints them to System.err (catalina.out).
>>>>
>>>>
>>>> Best regards,
>>>> Konstantin Kolinko
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
>>>> For additional commands, e-mail: users-help@tomcat.apache.org
>>>>
>>>>
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
>>> For additional commands, e-mail: users-help@tomcat.apache.org
>>>
>>>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message