httpd-bugs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pawel <pzl...@mp.pl>
Subject Re: httpd hangs ?
Date Thu, 22 Mar 2012 16:06:28 GMT
Dump is not most important  for me.. Its probably php code...
I'd to know how to prevent hanging other alive processes.
Have you any idea  what can I do ?

Thanks

Pawel





W dniu 2012-03-22 14:08, Jorge Román Novalbos pisze:
> if you want you can set the apache directive "CoreDumpDirectory 
> /tmp/apache2-gdb-dump" and analize later the segmentation fault cause 
> with gdb.
>
> Jorge
>
>
> On 22/03/2012, at 14:03, Pawel wrote:
>
>> It seems that always directly before hangs -  one of the child 
>> process is killed
>> /[notice] child pid 3818 exit signal Segmentation fault (11)/
>>
>> It seems that some kind of deadlock appears and main process is 
>> waiting for something..
>>
>> Probably apache worker configuration has no matter.
>> Number of request per second  change only probability of the signal.
>>
>> I used another machine with 4 CPU/4G Ram , the same system (disk 
>> image copy) and there is  no segmentation fault and  no hangs. 
>> Because  of capacity, the system is handling only part of production 
>> traffic...
>>
>> Pawel
>>
>>
>>
>>
>>
>>
>>
>> W dniu 2012-03-14 10:35, Jorge Román Novalbos pisze:
>>> I would try several things:
>>>
>>> 1- Get a system profile during whole day each minute if is possible. 
>>> If you have monitorization tools like nagios or cacti you can use it.
>>> 2- Isolate the server and try to stress it in order to reproduce the 
>>> problems and see the system profile. You could use ab o jmeter for this.
>>> 3- If you don't see the problem, I'll try to desactivate the logging 
>>> to disk. I don't know is you can effort that, but generally log to 
>>> disk reduce the apache performance.
>>>
>>>  Question:
>>>
>>> Whenever apache reaches 200 request per sec the apache hangs, but 
>>> this behavior happens always apache reach that threshold??? We have 
>>> to try to find some kind behavior patter for this.
>>>
>>> Jorge.
>>>
>>>
>>>
>>> On 14/03/2012, at 09:52, Pawel wrote:
>>>
>>>> W dniu 2012-03-14 08:31, Jorge Román Novalbos pisze:
>>>>> Ok, When the apache is hangs, how is the load average? Is high?
>>>> =1
>>>> When the apache is not running or dead - there is no other 
>>>> activities on server (without system task as cron job, syslog etc.)
>>>>>
>>>>>
>>>>> Could you get a system picture at the problem moment, i mean load, 
>>>>> top, free, iostat -x 1,  an apache server status, ps -aux in order 
>>>>> to find out where the bottleneck is.
>>>> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s 
>>>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>>>> sda               0.00     0.00    1.00    4.00     8.00    
>>>> 16.00     9.60     0.00    0.60    3.00    0.00   0.60   0.30
>>>> etherd!e2.3       0.00     0.00  156.00   40.00    24.00   
>>>> 160.00     8.00     0.00    0.24    0.27    0.12   0.24   0.70
>>>>
>>>> ps -auxf has been attached in the 1-st mail
>>>> system is  not swapping.
>>>>
>>>>>
>>>>> Questions:
>>>>> What kind of request are you serving? only statics objects or also 
>>>>> dinamic? php, java, ruby?
>>>> It seems that http process is not a worker - It listens on socket, 
>>>> there is no any established on waited connection to the process. It 
>>>> seems that it not server any content.
>>>> But generally apache use php and ajp12 modules.
>>>> Dynamic content - 93%
>>>>
>>>>>
>>>>> Do you know the apache threads size?
>>>> stack > 8196  but < 16 000 (because of ulimit)
>>>> on the apache  2.2.19 8196 was  enough.
>>>>
>>>> information about "normal" - alive apache process:
>>>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>> 16938 www       20   0  129m  11m 3064 S    2  0.1   0:00.07 httpd
>>>>
>>>> LDD
>>>>        linux-vdso.so.1 =>  (0x000073af9d1e7000)
>>>>         /lib64/libsafe.so.2 (0x000073af9cdc2000)
>>>>         libz.so.1 => /lib64/libz.so.1 (0x000073af9cba9000)
>>>>         libssl.so.1.0.0 => /usr/lib64/libssl.so.1.0.0 
>>>> (0x000073af9c945000)
>>>>         libcrypto.so.1.0.0 => /usr/lib64/libcrypto.so.1.0.0 
>>>> (0x000073af9c560000)
>>>>         libm.so.6 => /lib64/libm.so.6 (0x000073af9c2dd000)
>>>>         libexpat.so.1 => /usr/lib64/libexpat.so.1 (0x000073af9c0af000)
>>>>         libuuid.so.1 => /lib64/libuuid.so.1 (0x000073af9beaa000)
>>>>         librt.so.1 => /lib64/librt.so.1 (0x000073af9bca1000)
>>>>         libcrypt.so.1 => /lib64/libcrypt.so.1 (0x000073af9ba6a000)
>>>>         libpthread.so.0 => /lib64/libpthread.so.0 (0x000073af9b84d000)
>>>>         libdl.so.2 => /lib64/libdl.so.2 (0x000073af9b649000)
>>>>         libc.so.6 => /lib64/libc.so.6 (0x000073af9b2bd000)
>>>>         /lib64/ld-linux-x86-64.so.2 (0x000073af9cfc9000)
>>>>
>>>>>
>>>>> Hardware features, RAM, CPU, is a virtual o phisical server?
>>>> phisical, havy loaded, 16 CPU, 16G RAM.
>>>> there is > 200 vhosts, apache keeps opened  ~ 600 log files. At 
>>>> least 2 files are common for user logs of  ~50 vhosts (php), 
>>>> (~200MB logs per day per file).
>>>>
>>>>>
>>>>> Could you send me the mpm configuration? worker, prefork, 
>>>>> maxclient, etc...
>>>>
>>>> MaxKeepAliveRequests 50
>>>> KeepAliveTimeout 2
>>>> ListenBacklog 1
>>>>
>>>> StartServers         80
>>>> MinSpareServers      80
>>>> MaxSpareServers      250
>>>> MaxClients           250
>>>> MaxRequestsPerChild  20
>>>>
>>>> For tests I changed:
>>>> MinSpareServers    =  MaxSpareServers      = MaxClients      = 250
>>>> It works for ~23H. Lets see..
>>>>
>>>>
>>>> It could be helpful:
>>>> When apache process hags - no new chid process appears, some chids 
>>>> are zombie,  stopping apache is possible by killing wih 9 all 
>>>> apache process. (They die after ~2 minutes)
>>>>
>>>>
>>>>
>>>> Regards & Thanks
>>>> Pawel
>>>>
>>>>
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>> On 13/03/2012, at 16:53, Pawel wrote:
>>>>>
>>>>>> Hi,
>>>>>> No, I do not use NFS
>>>>>> It seems that apache is not waiting for any filesystem.
>>>>>> My "D" apache process keep opened only log files - located on 
>>>>>> local filesystem. There is no I/O disk traffic.
>>>>>>
>>>>>> Pawel
>>>>>>
>>>>>>
>>>>>> W dniu 2012-03-13 15:09, Jorge Román Novalbos pisze:
>>>>>>> Hi Pawel,
>>>>>>>
>>>>>>> I have got the same problem when I have network problem to reach

>>>>>>> our NFS volume. Does NFS involved in any apache process??
>>>>>>>
>>>>>>> I mean, the httpd binaries, logs o Documentroot are using  NFS??
>>>>>>>
>>>>>>> Jorge.
>>>>>>>
>>>>>>> On 13/03/2012, at 15:03, Pawel wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> After upgrading to 2.2.22 (from 2.2.19 ) my apache stop 
>>>>>>>> responding to network queries.
>>>>>>>> It happens on quite busy system (~200 workers ),  ~ one per
day.
>>>>>>>> one apache process is  in "D" state.
>>>>>>>>
>>>>>>>> Apache is running on 3.2.2 kernel, ,  gcc 4.5.3-r1 p1.0,
pie-0.4.5
>>>>>>>>
>>>>>>>>
>>>>>>>> Is it know bug?
>>>>>>>> Anyone can see that behavior ?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Pawel
>>>>>>>>
>>>>>>>>
>>>>>>>> *www      19504  0.0  0.0 141084 19828 ?        Ds   Mar09
  
>>>>>>>> 0:44 /usr/local/apache22/bin/httpd -k start*
>>>>>>>> www      12773  0.3  0.0      0     0 ?        Z    09:40
  
>>>>>>>> 0:01  \_ [httpd] <defunct>
>>>>>>>> www      12815  0.4  0.0      0     0 ?        Z    09:41
  
>>>>>>>> 0:01  \_ [httpd] <defunct>
>>>>>>>> www      12844  0.2  0.0      0     0 ?        Z    09:42
  
>>>>>>>> 0:00  \_ [httpd] <defunct>
>>>>>>>> www      12876  0.2  0.0      0     0 ?        Z    09:42
  
>>>>>>>> 0:01  \_ [httpd] <defunct>
>>>>>>>> www      12896  0.2  0.0      0     0 ?        Z    09:43
  
>>>>>>>> 0:00  \_ [httpd] <defunct>
>>>>>>>> www      12918  0.2  0.0      0     0 ?        Z    09:44
  
>>>>>>>> 0:00  \_ [httpd] <defunct>
>>>>>>>> www      12946  0.2  0.0      0     0 ?        Z    09:44
  
>>>>>>>> 0:00  \_ [httpd] <defunct>
>>>>>>>> www      12968  0.2  0.0      0     0 ?        Z    09:45
  
>>>>>>>> 0:00  \_ [httpd] <defunct>
>>>>>>>> www      13001  0.5  0.0      0     0 ?        Z    09:45
  
>>>>>>>> 0:01  \_ [httpd] <defunct>
>>>>>>>> www      13020  0.5  0.0      0     0 ?        Z    09:46
  
>>>>>>>> 0:01  \_ [httpd] <defunct>
>>>>>>>> www      13036  2.2  0.0      0     0 ?        Z    09:46
  
>>>>>>>> 0:04  \_ [httpd] <defunct>
>>>>>>>> www      13057  0.5  0.0      0     0 ?        Z    09:47
  
>>>>>>>> 0:00  \_ [httpd] <defunct>
>>>>>>>> www      13077  2.7  0.0      0     0 ?        Z    09:47
  
>>>>>>>> 0:03  \_ [httpd] <defunct>
>>>>>>>> www      13105  1.3  0.0      0     0 ?        Z    09:48
  
>>>>>>>> 0:01  \_ [httpd] <defunct>
>>>>>>>> www      13135  1.1  0.0 159492 24208 ?        SL   09:48
  
>>>>>>>> 0:00  \_ /usr/local/apache22/bin/httpd -k start
>>>>>>>> www      13171  0.5  0.0 156436 21408 ?        SL   09:49
  
>>>>>>>> 0:00  \_ /usr/local/apache22/bin/httpd -k start
>>>>>>>> www      13210  0.0  0.0 141084  9628 ?        R    09:49
  
>>>>>>>> 0:00  \_ /usr/local/apache22/bin/httpd -k start
>>>>>>>>
>>>>>>>> strace (one message per ~ 30 seconds )
>>>>>>>> clone(child_stack=0, 
>>>>>>>> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
>>>>>>>> child_tidptr=0x73be642dda10) = 13701
>>>>>>>> clone(child_stack=0, 
>>>>>>>> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
>>>>>>>> child_tidptr=0x73be642dda10) = 13720
>>>>>>>> clone(child_stack=0, 
>>>>>>>> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
>>>>>>>> child_tidptr=0x73be642dda10) = 13740
>>>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],

>>>>>>>> WNOHANG|WSTOPPED, NULL) = 12773
>>>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],

>>>>>>>> WNOHANG|WSTOPPED, NULL) = 12815
>>>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],

>>>>>>>> WNOHANG|WSTOPPED, NULL) = 12844
>>>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],

>>>>>>>> WNOHANG|WSTOPPED, NULL) = 12876
>>>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],

>>>>>>>> WNOHANG|WSTOPPED, NULL) = 12896
>>>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],

>>>>>>>> WNOHANG|WSTOPPED, NULL) = 12918
>>>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],

>>>>>>>> WNOHANG|WSTOPPED, NULL) = 12946
>>>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],

>>>>>>>> WNOHANG|WSTOPPED, NULL) = 12968
>>>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],

>>>>>>>> WNOHANG|WSTOPPED, NULL) = 13001
>>>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],

>>>>>>>> WNOHANG|WSTOPPED, NULL) = 13020
>>>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],

>>>>>>>> WNOHANG|WSTOPPED, NULL) = 13036
>>>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],

>>>>>>>> WNOHANG|WSTOPPED, NULL) = 13057
>>>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],

>>>>>>>> WNOHANG|WSTOPPED, NULL) = 13077
>>>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],

>>>>>>>> WNOHANG|WSTOPPED, NULL) = 13105
>>>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],

>>>>>>>> WNOHANG|WSTOPPED, NULL) = 13135
>>>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],

>>>>>>>> WNOHANG|WSTOPPED, NULL) = 13171
>>>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],

>>>>>>>> WNOHANG|WSTOPPED, NULL) = 13210
>>>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],

>>>>>>>> WNOHANG|WSTOPPED, NULL) = 13245
>>>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],

>>>>>>>> WNOHANG|WSTOPPED, NULL) = 13267
>>>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],

>>>>>>>> WNOHANG|WSTOPPED, NULL) = 13291
>>>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],

>>>>>>>> WNOHANG|WSTOPPED, NULL) = 13310
>>>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],

>>>>>>>> WNOHANG|WSTOPPED, NULL) = 13331
>>>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],

>>>>>>>> WNOHANG|WSTOPPED, NULL) = 13351
>>>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],

>>>>>>>> WNOHANG|WSTOPPED, NULL) = 13372
>>>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],

>>>>>>>> WNOHANG|WSTOPPED, NULL) = 13409
>>>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],

>>>>>>>> WNOHANG|WSTOPPED, NULL) = 13431
>>>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],

>>>>>>>> WNOHANG|WSTOPPED, NULL) = 13449
>>>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],

>>>>>>>> WNOHANG|WSTOPPED, NULL) = 13475
>>>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],

>>>>>>>> WNOHANG|WSTOPPED, NULL) = 13502
>>>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],

>>>>>>>> WNOHANG|WSTOPPED, NULL) = 13564
>>>>>>>> wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],

>>>>>>>> WNOHANG|WSTOPPED, NULL) = 13588
>>>>>>>> wait4(-1, 0x7ad781767684, WNOHANG|WSTOPPED, NULL) = 0
>>>>>>>> select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
>>>>>>>> write(2, "[Tue Mar 13 10:00:01 2012] [info"..., 179) = 179
>>>>>>>> clone(child_stack=0, 
>>>>>>>> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
>>>>>>>> child_tidptr=0x73be642dda10) = 13763
>>>>>>>> --- {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=13606,

>>>>>>>> si_status=0, si_utime=55, si_stime=24} (Child exited) ---
>>>>>>>> clone(child_stack=0, 
>>>>>>>> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
>>>>>>>> child_tidptr=0x73be642dda10) = 13782
>>>>>>>> --- {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=13639,

>>>>>>>> si_status=0, si_utime=38, si_stime=14} (Child exited) ---
>>>>>>>> clone(child_stack=0, 
>>>>>>>> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
>>>>>>>> child_tidptr=0x73be642dda10) = 13810
>>>>>>>> --- {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=13740,

>>>>>>>> si_status=0, si_utime=89, si_stime=34} (Child exited) ---
>>>>>>>> clone(child_stack=0, 
>>>>>>>> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
>>>>>>>> child_tidptr=0x73be642dda10) = 13846
>>>>>>>> clone(child_stack=0, 
>>>>>>>> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
>>>>>>>> child_tidptr=0x73be642dda10) = 13870
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


Mime
View raw message