httpd-bugs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pawel <pzl...@mp.pl>
Subject Re: httpd hangs ?
Date Wed, 14 Mar 2012 08:52:35 GMT
W dniu 2012-03-14 08:31, Jorge Román Novalbos pisze:
> Ok, When the apache is hangs, how is the load average? Is high?
=1
When the apache is not running or dead - there is no other activities on 
server (without system task as cron job, syslog etc.)
>
>
> Could you get a system picture at the problem moment, i mean load, 
> top, free, iostat -x 1,  an apache server status, ps -aux in order to 
> find out where the bottleneck is.
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s 
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    1.00    4.00     8.00    16.00     
9.60     0.00    0.60    3.00    0.00   0.60   0.30
etherd!e2.3       0.00     0.00  156.00   40.00    24.00   160.00     
8.00     0.00    0.24    0.27    0.12   0.24   0.70

ps -auxf has been attached in the 1-st mail
system is  not swapping.

>
> Questions:
> What kind of request are you serving? only statics objects or also 
> dinamic? php, java, ruby?
It seems that http process is not a worker - It listens on socket, there 
is no any established on waited connection to the process. It seems that 
it not server any content.
But generally apache use php and ajp12 modules.
Dynamic content - 93%

>
> Do you know the apache threads size?
stack > 8196  but < 16 000 (because of ulimit)
on the apache  2.2.19 8196 was  enough.

information about "normal" - alive apache process:
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
16938 www       20   0  129m  11m 3064 S    2  0.1   0:00.07 httpd

LDD
        linux-vdso.so.1 =>  (0x000073af9d1e7000)
         /lib64/libsafe.so.2 (0x000073af9cdc2000)
         libz.so.1 => /lib64/libz.so.1 (0x000073af9cba9000)
         libssl.so.1.0.0 => /usr/lib64/libssl.so.1.0.0 (0x000073af9c945000)
         libcrypto.so.1.0.0 => /usr/lib64/libcrypto.so.1.0.0 
(0x000073af9c560000)
         libm.so.6 => /lib64/libm.so.6 (0x000073af9c2dd000)
         libexpat.so.1 => /usr/lib64/libexpat.so.1 (0x000073af9c0af000)
         libuuid.so.1 => /lib64/libuuid.so.1 (0x000073af9beaa000)
         librt.so.1 => /lib64/librt.so.1 (0x000073af9bca1000)
         libcrypt.so.1 => /lib64/libcrypt.so.1 (0x000073af9ba6a000)
         libpthread.so.0 => /lib64/libpthread.so.0 (0x000073af9b84d000)
         libdl.so.2 => /lib64/libdl.so.2 (0x000073af9b649000)
         libc.so.6 => /lib64/libc.so.6 (0x000073af9b2bd000)
         /lib64/ld-linux-x86-64.so.2 (0x000073af9cfc9000)

>
> Hardware features, RAM, CPU, is a virtual o phisical server?
phisical, havy loaded, 16 CPU, 16G RAM.
there is > 200 vhosts, apache keeps opened  ~ 600 log files. At least 2 
files are common for user logs of  ~50 vhosts (php), (~200MB logs per 
day per file).

>
> Could you send me the mpm configuration? worker, prefork, maxclient, 
> etc...

MaxKeepAliveRequests 50
KeepAliveTimeout 2
ListenBacklog 1

StartServers         80
MinSpareServers      80
MaxSpareServers      250
MaxClients           250
MaxRequestsPerChild  20

For tests I changed:
MinSpareServers    =  MaxSpareServers      = MaxClients      = 250
It works for ~23H. Lets see..


It could be helpful:
When apache process hags - no new chid process appears, some chids are 
zombie,  stopping apache is possible by killing wih 9 all apache 
process. (They die after ~2 minutes)



Regards & Thanks
Pawel


>
> Thanks!
>
>
> On 13/03/2012, at 16:53, Pawel wrote:
>
>> Hi,
>> No, I do not use NFS
>> It seems that apache is not waiting for any filesystem.
>> My "D" apache process keep opened only log files - located on local 
>> filesystem. There is no I/O disk traffic.
>>
>> Pawel
>>
>>
>> W dniu 2012-03-13 15:09, Jorge Román Novalbos pisze:
>>> Hi Pawel,
>>>
>>> I have got the same problem when I have network problem to reach our 
>>> NFS volume. Does NFS involved in any apache process??
>>>
>>> I mean, the httpd binaries, logs o Documentroot are using  NFS??
>>>
>>> Jorge.
>>>
>>> On 13/03/2012, at 15:03, Pawel wrote:
>>>
>>>> Hello,
>>>>
>>>> After upgrading to 2.2.22 (from 2.2.19 ) my apache stop responding 
>>>> to network queries.
>>>> It happens on quite busy system (~200 workers ),  ~ one per day.
>>>> one apache process is  in "D" state.
>>>>
>>>> Apache is running on 3.2.2 kernel, ,  gcc 4.5.3-r1 p1.0, pie-0.4.5
>>>>
>>>>
>>>> Is it know bug?
>>>> Anyone can see that behavior ?
>>>>
>>>> Thanks
>>>> Pawel
>>>>
>>>>
>>>> *www      19504  0.0  0.0 141084 19828 ?        Ds   Mar09   0:44 
>>>> /usr/local/apache22/bin/httpd -k start*
>>>> www      12773  0.3  0.0      0     0 ?        Z    09:40   0:01  
>>>> \_ [httpd] <defunct>
>>>> www      12815  0.4  0.0      0     0 ?        Z    09:41   0:01  
>>>> \_ [httpd] <defunct>
>>>> www      12844  0.2  0.0      0     0 ?        Z    09:42   0:00  
>>>> \_ [httpd] <defunct>
>>>> www      12876  0.2  0.0      0     0 ?        Z    09:42   0:01  
>>>> \_ [httpd] <defunct>
>>>> www      12896  0.2  0.0      0     0 ?        Z    09:43   0:00  
>>>> \_ [httpd] <defunct>
>>>> www      12918  0.2  0.0      0     0 ?        Z    09:44   0:00  
>>>> \_ [httpd] <defunct>
>>>> www      12946  0.2  0.0      0     0 ?        Z    09:44   0:00  
>>>> \_ [httpd] <defunct>
>>>> www      12968  0.2  0.0      0     0 ?        Z    09:45   0:00  
>>>> \_ [httpd] <defunct>
>>>> www      13001  0.5  0.0      0     0 ?        Z    09:45   0:01  
>>>> \_ [httpd] <defunct>
>>>> www      13020  0.5  0.0      0     0 ?        Z    09:46   0:01  
>>>> \_ [httpd] <defunct>
>>>> www      13036  2.2  0.0      0     0 ?        Z    09:46   0:04  
>>>> \_ [httpd] <defunct>
>>>> www      13057  0.5  0.0      0     0 ?        Z    09:47   0:00  
>>>> \_ [httpd] <defunct>
>>>> www      13077  2.7  0.0      0     0 ?        Z    09:47   0:03  
>>>> \_ [httpd] <defunct>
>>>> www      13105  1.3  0.0      0     0 ?        Z    09:48   0:01  
>>>> \_ [httpd] <defunct>
>>>> www      13135  1.1  0.0 159492 24208 ?        SL   09:48   0:00  
>>>> \_ /usr/local/apache22/bin/httpd -k start
>>>> www      13171  0.5  0.0 156436 21408 ?        SL   09:49   0:00  
>>>> \_ /usr/local/apache22/bin/httpd -k start
>>>> www      13210  0.0  0.0 141084  9628 ?        R    09:49   0:00  
>>>> \_ /usr/local/apache22/bin/httpd -k start
>>>>
>>>> strace (one message per ~ 30 seconds )
>>>> clone(child_stack=0, 
>>>> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
>>>> child_tidptr=0x73be642dda10) = 13701
>>>> clone(child_stack=0, 
>>>> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
>>>> child_tidptr=0x73be642dda10) = 13720
>>>> clone(child_stack=0, 
>>>> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
>>>> child_tidptr=0x73be642dda10) = 13740
>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>> WNOHANG|WSTOPPED, NULL) = 12773
>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>> WNOHANG|WSTOPPED, NULL) = 12815
>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>> WNOHANG|WSTOPPED, NULL) = 12844
>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>> WNOHANG|WSTOPPED, NULL) = 12876
>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>> WNOHANG|WSTOPPED, NULL) = 12896
>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>> WNOHANG|WSTOPPED, NULL) = 12918
>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>> WNOHANG|WSTOPPED, NULL) = 12946
>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>> WNOHANG|WSTOPPED, NULL) = 12968
>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>> WNOHANG|WSTOPPED, NULL) = 13001
>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>> WNOHANG|WSTOPPED, NULL) = 13020
>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>> WNOHANG|WSTOPPED, NULL) = 13036
>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>> WNOHANG|WSTOPPED, NULL) = 13057
>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>> WNOHANG|WSTOPPED, NULL) = 13077
>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>> WNOHANG|WSTOPPED, NULL) = 13105
>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>> WNOHANG|WSTOPPED, NULL) = 13135
>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>> WNOHANG|WSTOPPED, NULL) = 13171
>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>> WNOHANG|WSTOPPED, NULL) = 13210
>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>> WNOHANG|WSTOPPED, NULL) = 13245
>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>> WNOHANG|WSTOPPED, NULL) = 13267
>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>> WNOHANG|WSTOPPED, NULL) = 13291
>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>> WNOHANG|WSTOPPED, NULL) = 13310
>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>> WNOHANG|WSTOPPED, NULL) = 13331
>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>> WNOHANG|WSTOPPED, NULL) = 13351
>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>> WNOHANG|WSTOPPED, NULL) = 13372
>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>> WNOHANG|WSTOPPED, NULL) = 13409
>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>> WNOHANG|WSTOPPED, NULL) = 13431
>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>> WNOHANG|WSTOPPED, NULL) = 13449
>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>> WNOHANG|WSTOPPED, NULL) = 13475
>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>> WNOHANG|WSTOPPED, NULL) = 13502
>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>> WNOHANG|WSTOPPED, NULL) = 13564
>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>> WNOHANG|WSTOPPED, NULL) = 13588
>>>> wait4(-1, 0x7ad781767684, WNOHANG|WSTOPPED, NULL) = 0
>>>> select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
>>>> write(2, "[Tue Mar 13 10:00:01 2012] [info"..., 179) = 179
>>>> clone(child_stack=0, 
>>>> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
>>>> child_tidptr=0x73be642dda10) = 13763
>>>> --- {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=13606, 
>>>> si_status=0, si_utime=55, si_stime=24} (Child exited) ---
>>>> clone(child_stack=0, 
>>>> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
>>>> child_tidptr=0x73be642dda10) = 13782
>>>> --- {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=13639, 
>>>> si_status=0, si_utime=38, si_stime=14} (Child exited) ---
>>>> clone(child_stack=0, 
>>>> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
>>>> child_tidptr=0x73be642dda10) = 13810
>>>> --- {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=13740, 
>>>> si_status=0, si_utime=89, si_stime=34} (Child exited) ---
>>>> clone(child_stack=0, 
>>>> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
>>>> child_tidptr=0x73be642dda10) = 13846
>>>> clone(child_stack=0, 
>>>> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
>>>> child_tidptr=0x73be642dda10) = 13870
>>>>
>>>
>>
>


Mime
View raw message