tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier ...@ice-sa.com>
Subject CLOSE_WAIT and what to do about it
Date Wed, 08 Apr 2009 10:32:57 GMT
Hi.
As a follow-upon another thread originally entitled "apache/tomcat 
communication issues (502 response)", I'd like to pursue the CLOSE-WAIT 
subject.

Sorry if this post is a bit long, I want to make sure that I do provide 
all the necessary information.

Like the original poster, I am seeing on my systems a fair number of 
sockets apparently stuck for a long time in the CLOSE_WAIT state.
(Sometimes several hundreds of them).
They seem to predominantly concern Tomcat and other java processes, but 
as Alan pointed out previously and I confirm, my perspective is slanted, 
because we use a lot of common java programs and webapps on our servers, 
and the ones mostly affected talk to eachother and come from the same 
vendor.
Unfortunately also, I do not have the sources of these programs/webapps 
available, and will not get them, and I can't do without these programs.

It has been previously established that a socket in a 
long-time-lingering CLOSE-WAIT status, is due to one or the other side 
of a TCP connection not properly closing its side of the connection when 
it is done with it.
I also surmise (without having a definite proof of this), that this is 
essentially "bad", as it ties up some resources that could be otherwise 
freed.
I have also been told or discovered that, our servers being Linux Debian 
servers, programs such as "ps", "netstat" and "lsof" can help in 
determining precisely how many such lingering sockets there are, and who 
the culprit processes are (to some extent).

In our case, we know which are the programs involved, because we know 
which ones open a listening socket and on what fixed port, and we also 
know which are the other processes talking to them.
But, as mentioned previously, we do not have the source of these 
programs and will not get them, but cannot practically do without them 
for now. But we do have full root control of the Linux servers where 
these programs are running.

So my question is : considering the situation above, is there something 
I can do locally to free these lingering CLOSE_WAIT sockets, and under 
which conditions ?
(I must admit that I am a bit lost among the myriad options of lsof)

For example, suppose I start with a "netstat -pan" command and I see the 
display below (sorry for the line-wrapping).
I see a number of sockets in the CLOSE_WAIT state, and for those I have 
a process-id, which I can associate to a particular process.
For example, I see this line :
tcp6      12      0 ::ffff:127.0.0.1:41764  ::ffff:127.0.0.1:11002 
CLOSE_WAIT 29649/java
which tells me that there is a local process 29649/java, whith a "local" 
socket port 41674 in the CLOSE_WAIT state, related to another socket 
#11002 on the same host.
On the other hand, I see this line :
tcp        0      0 127.0.0.1:11002         127.0.0.1:41764 
FIN_WAIT2  -
which shows a "local" socket on port 11002, related to this other local 
socket port #41764, with no process-id/program displayed.
What does that tell me ?

I also know that the process-id 29649 corresponds to a local java 
process, of the daemon variety, multi-threaded.  That program "talks to" 
another known server program, written in C, of which instances are 
started on an ad-hoc base by inetd, and which "listens" on port 11002 
(in fact it is inetd who does, and it passes this socket on to the 
process it forks, I understand that).

(The link with Tomcat is that I also see frequently the same situation, 
where the process "owning" the CLOSE_WAIT socket is Tomcat, more 
specifically one webapp running inside it.  It's just that in this 
particular snapshot it isn't.)

What it looks like to me in this case, is that at some point one of the 
threads of process # 29649 opened a client socket #41674 to the local 
inetd port #11002; that inetd then started the underlying server process 
(the C program); that the underlying C program then at some point 
exited; but that process #41674 never closes one of the sides of its 
connection with port #11002.
Can I somehow detect this condition, and "force" the offending thread of 
process #29649 to close that socket (or just force this thread to exit) ?

I realise this may be a complex question, and that the answers may be 
different if it is a Tomcat webapp than a stand-alone process.  I would 
be content to just have answers for the webapp case.


Full display of "netstat -pan | grep WAIT" :

Proto Recv-Q Send-Q Local Address           Foreign Address 
State       PID/Program name
tcp        0      0 127.0.0.1:11002         127.0.0.1:41763 
TIME_WAIT  -
tcp        0      0 127.0.0.1:11002         127.0.0.1:41764 
FIN_WAIT2  -
tcp        0      0 127.0.0.1:11002         127.0.0.1:41738 
TIME_WAIT  -
tcp        0      0 127.0.0.1:11002         127.0.0.1:41739 
FIN_WAIT2  -
tcp        0      0 127.0.0.1:11002         127.0.0.1:41741 
TIME_WAIT  -
tcp        0      0 127.0.0.1:11002         127.0.0.1:41735 
TIME_WAIT  -
tcp        0      0 127.0.0.1:11002         127.0.0.1:41755 
TIME_WAIT  -
tcp        0      0 127.0.0.1:11002         127.0.0.1:41752 
TIME_WAIT  -
tcp        0      0 127.0.0.1:11002         127.0.0.1:41753 
FIN_WAIT2  -
tcp        0      0 127.0.0.1:11002         127.0.0.1:41758 
TIME_WAIT  -
tcp        0      0 127.0.0.1:11002         127.0.0.1:41759 
FIN_WAIT2  -
tcp        0      0 127.0.0.1:11002         127.0.0.1:41744 
TIME_WAIT  -
tcp        0      0 127.0.0.1:11002         127.0.0.1:41749 
TIME_WAIT  -
tcp6       0      0 ::ffff:127.0.0.1:11101  ::ffff:127.0.0.1:41762 
FIN_WAIT2  -
tcp6       0      0 ::ffff:212.85.38.:11100 ::ffff:212.85.38.:41737 
TIME_WAIT  -
tcp6       0      0 ::ffff:127.0.0.1:11101  ::ffff:127.0.0.1:41743 
TIME_WAIT  -
tcp6       0      0 ::ffff:127.0.0.1:11101  ::ffff:127.0.0.1:41740 
TIME_WAIT  -
tcp6       0      0 ::ffff:127.0.0.1:11101  ::ffff:127.0.0.1:41734 
TIME_WAIT  -
tcp6       0      0 ::ffff:127.0.0.1:11101  ::ffff:127.0.0.1:41754 
TIME_WAIT  -
tcp6       0      0 ::ffff:127.0.0.1:11101  ::ffff:127.0.0.1:41757 
TIME_WAIT  -
tcp6       0      0 ::ffff:212.85.38.:11100 ::ffff:212.85.38.:41751 
TIME_WAIT  -
tcp6       0      0 ::ffff:127.0.0.1:11101  ::ffff:127.0.0.1:41748 
FIN_WAIT2  -
tcp6      12      0 ::ffff:127.0.0.1:41711  ::ffff:127.0.0.1:11002 
CLOSE_WAIT 13333/java
tcp6      12      0 ::ffff:127.0.0.1:41708  ::ffff:127.0.0.1:11002 
CLOSE_WAIT 13333/java
tcp6      12      0 ::ffff:127.0.0.1:41764  ::ffff:127.0.0.1:11002 
CLOSE_WAIT 29649/java
tcp6      12      0 ::ffff:127.0.0.1:41753  ::ffff:127.0.0.1:11002 
CLOSE_WAIT 13333/java
tcp6      12      0 ::ffff:127.0.0.1:41759  ::ffff:127.0.0.1:11002 
CLOSE_WAIT 29649/java
tcp6      12      0 ::ffff:127.0.0.1:41739  ::ffff:127.0.0.1:11002 
CLOSE_WAIT 13333/java
tcp6      12      0 ::ffff:127.0.0.1:39436  ::ffff:127.0.0.1:11002 
CLOSE_WAIT 13333/java
tcp6      12      0 ::ffff:127.0.0.1:38989  ::ffff:127.0.0.1:11002 
CLOSE_WAIT 13333/java
tcp6      12      0 ::ffff:127.0.0.1:39364  ::ffff:127.0.0.1:11002 
CLOSE_WAIT 13333/java
tcp6      12      0 ::ffff:127.0.0.1:39390  ::ffff:127.0.0.1:11002 
CLOSE_WAIT 13333/java
tcp6      12      0 ::ffff:127.0.0.1:40859  ::ffff:127.0.0.1:11002 
CLOSE_WAIT 13333/java
tcp6       1      0 ::ffff:127.0.0.1:39412  ::ffff:127.0.0.1:11101 
CLOSE_WAIT 2864/java
tcp6       1      0 ::ffff:127.0.0.1:41249  ::ffff:127.0.0.1:11101 
CLOSE_WAIT 2864/java
tcp6       1      0 ::ffff:127.0.0.1:41748  ::ffff:127.0.0.1:11101 
CLOSE_WAIT 2864/java
tcp6       1      0 ::ffff:127.0.0.1:41731  ::ffff:127.0.0.1:11101 
CLOSE_WAIT 2864/java
tcp6       1      0 ::ffff:127.0.0.1:41762  ::ffff:127.0.0.1:11101 
CLOSE_WAIT 2864/java
tcp6       0      0 ::ffff:212.85.38.176:80 ::ffff:212.85.38.:56212 
TIME_WAIT  -





---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message