httpd-bugs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pawel <pzl...@mp.pl>
Subject Re: httpd hangs ?
Date Thu, 22 Mar 2012 13:03:05 GMT
It seems that always directly before hangs -  one of the child process 
is killed
/[notice] child pid 3818 exit signal Segmentation fault (11)/

It seems that some kind of deadlock appears and main process is waiting 
for something..

Probably apache worker configuration has no matter.
Number of request per second  change only probability of the signal.

I used another machine with 4 CPU/4G Ram , the same system (disk image 
copy) and there is  no segmentation fault and  no hangs. Because  of 
capacity, the system is handling only part of production traffic...

Pawel







W dniu 2012-03-14 10:35, Jorge Román Novalbos pisze:
> I would try several things:
>
> 1- Get a system profile during whole day each minute if is possible. 
> If you have monitorization tools like nagios or cacti you can use it.
> 2- Isolate the server and try to stress it in order to reproduce the 
> problems and see the system profile. You could use ab o jmeter for this.
> 3- If you don't see the problem, I'll try to desactivate the logging 
> to disk. I don't know is you can effort that, but generally log to 
> disk reduce the apache performance.
>
>  Question:
>
> Whenever apache reaches 200 request per sec the apache hangs, but this 
> behavior happens always apache reach that threshold??? We have to try 
> to find some kind behavior patter for this.
>
> Jorge.
>
>
>
> On 14/03/2012, at 09:52, Pawel wrote:
>
>> W dniu 2012-03-14 08:31, Jorge Román Novalbos pisze:
>>> Ok, When the apache is hangs, how is the load average? Is high?
>> =1
>> When the apache is not running or dead - there is no other activities 
>> on server (without system task as cron job, syslog etc.)
>>>
>>>
>>> Could you get a system picture at the problem moment, i mean load, 
>>> top, free, iostat -x 1,  an apache server status, ps -aux in order 
>>> to find out where the bottleneck is.
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s 
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> sda               0.00     0.00    1.00    4.00     8.00    16.00     
>> 9.60     0.00    0.60    3.00    0.00   0.60   0.30
>> etherd!e2.3       0.00     0.00  156.00   40.00    24.00   160.00     
>> 8.00     0.00    0.24    0.27    0.12   0.24   0.70
>>
>> ps -auxf has been attached in the 1-st mail
>> system is  not swapping.
>>
>>>
>>> Questions:
>>> What kind of request are you serving? only statics objects or also 
>>> dinamic? php, java, ruby?
>> It seems that http process is not a worker - It listens on socket, 
>> there is no any established on waited connection to the process. It 
>> seems that it not server any content.
>> But generally apache use php and ajp12 modules.
>> Dynamic content - 93%
>>
>>>
>>> Do you know the apache threads size?
>> stack > 8196  but < 16 000 (because of ulimit)
>> on the apache  2.2.19 8196 was  enough.
>>
>> information about "normal" - alive apache process:
>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> 16938 www       20   0  129m  11m 3064 S    2  0.1   0:00.07 httpd
>>
>> LDD
>>        linux-vdso.so.1 =>  (0x000073af9d1e7000)
>>         /lib64/libsafe.so.2 (0x000073af9cdc2000)
>>         libz.so.1 => /lib64/libz.so.1 (0x000073af9cba9000)
>>         libssl.so.1.0.0 => /usr/lib64/libssl.so.1.0.0 
>> (0x000073af9c945000)
>>         libcrypto.so.1.0.0 => /usr/lib64/libcrypto.so.1.0.0 
>> (0x000073af9c560000)
>>         libm.so.6 => /lib64/libm.so.6 (0x000073af9c2dd000)
>>         libexpat.so.1 => /usr/lib64/libexpat.so.1 (0x000073af9c0af000)
>>         libuuid.so.1 => /lib64/libuuid.so.1 (0x000073af9beaa000)
>>         librt.so.1 => /lib64/librt.so.1 (0x000073af9bca1000)
>>         libcrypt.so.1 => /lib64/libcrypt.so.1 (0x000073af9ba6a000)
>>         libpthread.so.0 => /lib64/libpthread.so.0 (0x000073af9b84d000)
>>         libdl.so.2 => /lib64/libdl.so.2 (0x000073af9b649000)
>>         libc.so.6 => /lib64/libc.so.6 (0x000073af9b2bd000)
>>         /lib64/ld-linux-x86-64.so.2 (0x000073af9cfc9000)
>>
>>>
>>> Hardware features, RAM, CPU, is a virtual o phisical server?
>> phisical, havy loaded, 16 CPU, 16G RAM.
>> there is > 200 vhosts, apache keeps opened  ~ 600 log files. At least 
>> 2 files are common for user logs of  ~50 vhosts (php), (~200MB logs 
>> per day per file).
>>
>>>
>>> Could you send me the mpm configuration? worker, prefork, maxclient, 
>>> etc...
>>
>> MaxKeepAliveRequests 50
>> KeepAliveTimeout 2
>> ListenBacklog 1
>>
>> StartServers         80
>> MinSpareServers      80
>> MaxSpareServers      250
>> MaxClients           250
>> MaxRequestsPerChild  20
>>
>> For tests I changed:
>> MinSpareServers    =  MaxSpareServers      = MaxClients      = 250
>> It works for ~23H. Lets see..
>>
>>
>> It could be helpful:
>> When apache process hags - no new chid process appears, some chids 
>> are zombie,  stopping apache is possible by killing wih 9 all apache 
>> process. (They die after ~2 minutes)
>>
>>
>>
>> Regards & Thanks
>> Pawel
>>
>>
>>>
>>> Thanks!
>>>
>>>
>>> On 13/03/2012, at 16:53, Pawel wrote:
>>>
>>>> Hi,
>>>> No, I do not use NFS
>>>> It seems that apache is not waiting for any filesystem.
>>>> My "D" apache process keep opened only log files - located on local 
>>>> filesystem. There is no I/O disk traffic.
>>>>
>>>> Pawel
>>>>
>>>>
>>>> W dniu 2012-03-13 15:09, Jorge Román Novalbos pisze:
>>>>> Hi Pawel,
>>>>>
>>>>> I have got the same problem when I have network problem to reach 
>>>>> our NFS volume. Does NFS involved in any apache process??
>>>>>
>>>>> I mean, the httpd binaries, logs o Documentroot are using  NFS??
>>>>>
>>>>> Jorge.
>>>>>
>>>>> On 13/03/2012, at 15:03, Pawel wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> After upgrading to 2.2.22 (from 2.2.19 ) my apache stop 
>>>>>> responding to network queries.
>>>>>> It happens on quite busy system (~200 workers ),  ~ one per day.
>>>>>> one apache process is  in "D" state.
>>>>>>
>>>>>> Apache is running on 3.2.2 kernel, ,  gcc 4.5.3-r1 p1.0, pie-0.4.5
>>>>>>
>>>>>>
>>>>>> Is it know bug?
>>>>>> Anyone can see that behavior ?
>>>>>>
>>>>>> Thanks
>>>>>> Pawel
>>>>>>
>>>>>>
>>>>>> *www      19504  0.0  0.0 141084 19828 ?        Ds   Mar09   0:44

>>>>>> /usr/local/apache22/bin/httpd -k start*
>>>>>> www      12773  0.3  0.0      0     0 ?        Z    09:40   0:01
 
>>>>>> \_ [httpd] <defunct>
>>>>>> www      12815  0.4  0.0      0     0 ?        Z    09:41   0:01
 
>>>>>> \_ [httpd] <defunct>
>>>>>> www      12844  0.2  0.0      0     0 ?        Z    09:42   0:00
 
>>>>>> \_ [httpd] <defunct>
>>>>>> www      12876  0.2  0.0      0     0 ?        Z    09:42   0:01
 
>>>>>> \_ [httpd] <defunct>
>>>>>> www      12896  0.2  0.0      0     0 ?        Z    09:43   0:00
 
>>>>>> \_ [httpd] <defunct>
>>>>>> www      12918  0.2  0.0      0     0 ?        Z    09:44   0:00
 
>>>>>> \_ [httpd] <defunct>
>>>>>> www      12946  0.2  0.0      0     0 ?        Z    09:44   0:00
 
>>>>>> \_ [httpd] <defunct>
>>>>>> www      12968  0.2  0.0      0     0 ?        Z    09:45   0:00
 
>>>>>> \_ [httpd] <defunct>
>>>>>> www      13001  0.5  0.0      0     0 ?        Z    09:45   0:01
 
>>>>>> \_ [httpd] <defunct>
>>>>>> www      13020  0.5  0.0      0     0 ?        Z    09:46   0:01
 
>>>>>> \_ [httpd] <defunct>
>>>>>> www      13036  2.2  0.0      0     0 ?        Z    09:46   0:04
 
>>>>>> \_ [httpd] <defunct>
>>>>>> www      13057  0.5  0.0      0     0 ?        Z    09:47   0:00
 
>>>>>> \_ [httpd] <defunct>
>>>>>> www      13077  2.7  0.0      0     0 ?        Z    09:47   0:03
 
>>>>>> \_ [httpd] <defunct>
>>>>>> www      13105  1.3  0.0      0     0 ?        Z    09:48   0:01
 
>>>>>> \_ [httpd] <defunct>
>>>>>> www      13135  1.1  0.0 159492 24208 ?        SL   09:48   0:00
 
>>>>>> \_ /usr/local/apache22/bin/httpd -k start
>>>>>> www      13171  0.5  0.0 156436 21408 ?        SL   09:49   0:00
 
>>>>>> \_ /usr/local/apache22/bin/httpd -k start
>>>>>> www      13210  0.0  0.0 141084  9628 ?        R    09:49   0:00
 
>>>>>> \_ /usr/local/apache22/bin/httpd -k start
>>>>>>
>>>>>> strace (one message per ~ 30 seconds )
>>>>>> clone(child_stack=0, 
>>>>>> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
>>>>>> child_tidptr=0x73be642dda10) = 13701
>>>>>> clone(child_stack=0, 
>>>>>> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
>>>>>> child_tidptr=0x73be642dda10) = 13720
>>>>>> clone(child_stack=0, 
>>>>>> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
>>>>>> child_tidptr=0x73be642dda10) = 13740
>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>>>> WNOHANG|WSTOPPED, NULL) = 12773
>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>>>> WNOHANG|WSTOPPED, NULL) = 12815
>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>>>> WNOHANG|WSTOPPED, NULL) = 12844
>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>>>> WNOHANG|WSTOPPED, NULL) = 12876
>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>>>> WNOHANG|WSTOPPED, NULL) = 12896
>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>>>> WNOHANG|WSTOPPED, NULL) = 12918
>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>>>> WNOHANG|WSTOPPED, NULL) = 12946
>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>>>> WNOHANG|WSTOPPED, NULL) = 12968
>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>>>> WNOHANG|WSTOPPED, NULL) = 13001
>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>>>> WNOHANG|WSTOPPED, NULL) = 13020
>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>>>> WNOHANG|WSTOPPED, NULL) = 13036
>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>>>> WNOHANG|WSTOPPED, NULL) = 13057
>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>>>> WNOHANG|WSTOPPED, NULL) = 13077
>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>>>> WNOHANG|WSTOPPED, NULL) = 13105
>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>>>> WNOHANG|WSTOPPED, NULL) = 13135
>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>>>> WNOHANG|WSTOPPED, NULL) = 13171
>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>>>> WNOHANG|WSTOPPED, NULL) = 13210
>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>>>> WNOHANG|WSTOPPED, NULL) = 13245
>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>>>> WNOHANG|WSTOPPED, NULL) = 13267
>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>>>> WNOHANG|WSTOPPED, NULL) = 13291
>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>>>> WNOHANG|WSTOPPED, NULL) = 13310
>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>>>> WNOHANG|WSTOPPED, NULL) = 13331
>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>>>> WNOHANG|WSTOPPED, NULL) = 13351
>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>>>> WNOHANG|WSTOPPED, NULL) = 13372
>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>>>> WNOHANG|WSTOPPED, NULL) = 13409
>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>>>> WNOHANG|WSTOPPED, NULL) = 13431
>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>>>> WNOHANG|WSTOPPED, NULL) = 13449
>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>>>> WNOHANG|WSTOPPED, NULL) = 13475
>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>>>> WNOHANG|WSTOPPED, NULL) = 13502
>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>>>> WNOHANG|WSTOPPED, NULL) = 13564
>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 
>>>>>> WNOHANG|WSTOPPED, NULL) = 13588
>>>>>> wait4(-1, 0x7ad781767684, WNOHANG|WSTOPPED, NULL) = 0
>>>>>> select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
>>>>>> write(2, "[Tue Mar 13 10:00:01 2012] [info"..., 179) = 179
>>>>>> clone(child_stack=0, 
>>>>>> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
>>>>>> child_tidptr=0x73be642dda10) = 13763
>>>>>> --- {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=13606, 
>>>>>> si_status=0, si_utime=55, si_stime=24} (Child exited) ---
>>>>>> clone(child_stack=0, 
>>>>>> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
>>>>>> child_tidptr=0x73be642dda10) = 13782
>>>>>> --- {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=13639, 
>>>>>> si_status=0, si_utime=38, si_stime=14} (Child exited) ---
>>>>>> clone(child_stack=0, 
>>>>>> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
>>>>>> child_tidptr=0x73be642dda10) = 13810
>>>>>> --- {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=13740, 
>>>>>> si_status=0, si_utime=89, si_stime=34} (Child exited) ---
>>>>>> clone(child_stack=0, 
>>>>>> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
>>>>>> child_tidptr=0x73be642dda10) = 13846
>>>>>> clone(child_stack=0, 
>>>>>> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
>>>>>> child_tidptr=0x73be642dda10) = 13870
>>>>>>
>>>>>
>>>>
>>>
>>
>


Mime
View raw message