httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jon Snow <jsno...@gatesec.net>
Subject rproxy close_wait high cpu
Date Tue, 25 Oct 2005 13:54:30 GMT

Hi,

I am running solaris 8 with apache 2.0.53 over openssl 0.9.7 as a reverse 
proxy which terminates and reestablishes SSL to the backend. I run the worker 
mpm with 100 threads per process. The server has minimal load and uses a 
crypto card. The build is pretty minimal and runs in a gateway environment.

I have had alot of issues with processes going into a high cpu state. These 
are always associated with close_wait state connections and nearly all the 
time all the connections are close_wait. Would this suggest something shared 
is affecting all connections/threads in the process?

After trolling through 50Gb of snoop dumps I have found that the latest appear 
associated with an outbound SSL session that does not fully establish. These 
outbound connections receive the server certificate and should send the 
client key/cipher exchange but nothing follows. The client to proxy session 
is normal and the request has been received by the proxy, the proxy does not 
establish SSL properly to the server and as such the proxy does not forward 
the request.

In the mean time the client and server ends time out (120s approx) and send a 
fin which the proxy acks. In both instances the proxy does not return a fin, 
nor does it timeout. At some point the process goes high CPU. This is very 
repeatable on a server with a load of around 10-15 connections per second and 
may occur on average 1-2 times a day. Unfortunately this thing is killing us 
on a highly political service.

While it appears SSL is the cause, apache does not seem to be handling the 
situation very well. It looks like apache knows it has data to send to the 
server and data to return to the client so it does not close it's side of the 
socket. At some point it must go into some sort of loop trying to recover I 
guess.

I can't recall truss returning anything on these reverse proxy runaway 
processes but I managed to get a truss of a forward proxy using heaps of 
memory. It looks like apache is not checking the return value on it's write. 
From memory in this instance there was no longer an inside connection but 
apache was still trying to write to one. The forward proxy servers also 
suffer from close_waits when using huge memory but do not suffer from high 
cpu. I am thinking the same kind of issue may exist with the reverse proxy 
where it loops to read/write a closed socket without checking the return 
value resulting in the high cpu.

I have now turned on debug logging for the SSL diag but does anyone have any 
idea what may be going on here with apache and the high cpu? When does the 
connection timer start, before the connection is attempted or after it is 
established? Is there any more info you need?

Thanks,
Jon


Forward Proxy stuff
###############################################################

16786:  writev(128, 0xFFFFFFFF7440C860, 6)              Err#32 EPIPE
16786:  write(6, 0xFFFFFFFF7440A510, 99)                = 99
16786:     [ T u e   A p r   1 9   0 8 : 2 7 : 3 1   2 0 0 5 ]   [ i n f o
16786:     ]   ( 3 2 ) B r o k e n   p i p e :   c o r e _ o u t p u t _ f
16786:     i l t e r :   w r i t i n g   d a t a   t o   t h e   n e t w o
16786:     r k\n
16786:  lwp_sigredirect(0, SIGPIPE, 0x00000000)         = 0
16786:  poll(0xFFFFFFFF7440C3C0, 1, 300000)             = 1
16786:  read(131, "C9 nF216 R 08DF303CFE7 F".., 8000)   = 1360
16786:  poll(0xFFFFFFFF7440C3C0, 1, 300000)             = 1
16786:  read(131, "11 1 # ; GF9FD + T04 e R".., 8000)   = 1360
16786:  poll(0xFFFFFFFF7440C3C0, 1, 300000)             = 1
16786:  read(131, "10 91C f94 I93D2A6CD15 {".., 8000)   = 1360
16786:  poll(0xFFFFFFFF7440C3C0, 1, 300000)             = 1
16786:  read(131, "AEE8C78F9504B8 VF607918C".., 8000)   = 1360
16786:  poll(0xFFFFFFFF7440C3C0, 1, 300000)             = 1
16786:  read(131, " 0 r K\0 UB0A0\fF3C8 l93".., 8000)   = 1360
16786:  poll(0xFFFFFFFF7440C3C0, 1, 300000)             = 1
16786:  read(131, " \ `1EA4D51BCF % /A7 .F3".., 8000)   = 1360
16786:  signotifywait()                                 = 13
16786:  writev(128, 0xFFFFFFFF7440C860, 6)              Err#32 EPIPE
16786:  write(6, 0xFFFFFFFF7440A510, 99)                = 99
16786:     [ T u e   A p r   1 9   0 8 : 2 7 : 3 2   2 0 0 5 ]   [ i n f o
16786:     ]   ( 3 2 ) B r o k e n   p i p e :   c o r e _ o u t p u t _ f
16786:     i l t e r :   w r i t i n g   d a t a   t o   t h e   n e t w o
16786:     r k\n
16786:  lwp_sigredirect(0, SIGPIPE, 0x00000000)         = 0
16786:  poll(0xFFFFFFFF6DA063C0, 1, 300000)             = 1
16786:  read(45, "11BE K1501 w01C1 = + Z X".., 8000)    = 4080
16786:  signotifywait()                                 = 13
16786:  writev(70, 0xFFFFFFFF6DA06860, 5)               Err#32 EPIPE
16786:  write(6, 0xFFFFFFFF6DA04510, 99)                = 99
16786:     [ T u e   A p r   1 9   0 8 : 2 7 : 3 3   2 0 0 5 ]   [ i n f o
16786:     ]   ( 3 2 ) B r o k e n   p i p e :   c o r e _ o u t p u t _ f
16786:     i l t e r :   w r i t i n g   d a t a   t o   t h e   n e t w o
16786:     r k\n
16786:  lwp_sigredirect(0, SIGPIPE, 0x00000000)         = 0

###################################################################

100 processes: 98 sleeping, 1 running, 1 on cpu
CPU states: 88.0% idle,  4.8% user,  7.2% kernel,  0.0% iowait,  0.0% swap
Memory: 1024M real, 246M free, 2595M swap in use, 570M swap free

   PID USERNAME THR PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
 16786  11000     6  59    0  156M  130M sleep   13:55  0.12% httpd
 16671  11000     5  59    0   93M   68M sleep    7:32  0.04% httpd
 16813  11000     4  59    0   91M   68M sleep    7:05  0.04% httpd

###################################################################






Mime
View raw message