httpd-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kenneth Berland <...@hero.com>
Subject [users@httpd] 100% CPU and Orphaned Children
Date Thu, 08 Jan 2004 17:28:17 GMT
Server Version: Apache/1.3.29 (Unix) PHP/4.3.2 mod_perl/1.29  mod_ssl/2.8.16 OpenSSL/0.9.7c
Server Built: Jan 7 2004 08:58:39

I'm having trouble with runaway server processes.  After an inconsistent
time running, the parent (root) process starts using 100% of the cpu.  The
children still function.  I upgraded from 1.3.27, where I had a similar
problem, except it was the children using 100% of the CPU.

I think the problem started with a kernel upgrade to 2.4.23, but I thought
it was a user space problem and spotted a user with some new php to
reprimand.

Following is some diagnostics i tried to figure it out with.  First is the
top output, second is an strace and ltrace on the parent.  Shit, I forgot
to trace a child, next is the error log showing that the children were
hard to find.  Last are SEGV's from the error log, which are much more
frequent (10X) since the kernel upgrade, yet they don't seem to be fatal
and I can't figure out from the logs if any specific page is causing them.

Thanks for any help, I have no idea where to start on this one.

 08:38:01  up 2 days,  9:57,  6 users,  load average: 2.14, 2.20, 2.18
73 processes: 70 sleeping, 3 running, 0 zombie, 0 stopped
CPU states:  15.2% user   1.0% system   0.0% nice   0.0% iowait  83.7% idle
Mem:   904504k av,  844388k used,   60116k free,       0k shrd,  212464k buff
       297168k active,             260340k inactive
Swap: 1004052k av,    2556k used, 1001496k free                  268028k cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
28382 root       8   0 11452  11M 11112 S    97.2  1.2  91:31   0 httpd
 4326 root      11   0  1116 1116   812 R     2.8  0.1   0:00   0 top
    1 root       8   0   504  504   456 S     0.0  0.0   0:05   0 init
    2 root       9   0     0    0     0 SW    0.0  0.0   0:00   0 keventd
    3 root      19  19     0    0     0 SWN   0.0  0.0   0:00   0 ksoftirqd_CPU0
    4 root       9   0     0    0     0 SW    0.0  0.0   0:12   0 kswapd
    5 root       9   0     0    0     0 SW    0.0  0.0   0:00   0 bdflush
    6 root       9   0     0    0     0 SW    0.0  0.0   0:00   0 kupdated
    8 root       9   0     0    0     0 SW    0.0  0.0   0:00   0 khubd
    9 root       9   0     0    0     0 SW    0.0  0.0   0:24   0 kjournald
   95 root       9   0     0    0     0 SW    0.0  0.0   0:06   0 kjournald
   96 root       9   0     0    0     0 SW    0.0  0.0   0:02   0 kjournald
  397 root       9   0   576  576   496 S     0.0  0.0   0:14   0 syslogd
  401 root       9   0   460  460   408 S     0.0  0.0   0:00   0 klogd
  450 daemon     9   0   548  544   492 S     0.0  0.0   0:00   0 atd
  459 root       9   0  1172 1108  1000 S     0.0  0.1   0:05   0 sshd
[root@hero root]# strace -p 28382
select(0, NULL, NULL, NULL, {0, 440000}) = 0 (Timeout)
time(NULL)                              = 1073579891
wait4(-1, 0xbffff9bc, WNOHANG, NULL)    = 0
select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
time(NULL)                              = 1073579892
wait4(-1, 0xbffff9bc, WNOHANG, NULL)    = 0
select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
time(NULL)                              = 1073579893
wait4(-1, 0xbffff9bc, WNOHANG, NULL)    = 0
select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
time(NULL)                              = 1073579894
wait4(-1, 0xbffff9bc, WNOHANG, NULL)    = 0
select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
time(NULL)                              = 1073579895
wait4(-1, 0xbffff9bc, WNOHANG, NULL)    = 0
select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
time(NULL)                              = 1073579896
wait4(-1, 0xbffff9bc, WNOHANG, NULL)    = 0
select(0, NULL, NULL, NULL, {1, 0} <unfinished ...>
[root@hero root]# ps -efw | grep http
nobody   26415 28382  0 02:48 ?        00:00:32 /usr/local/apache/bin/httpd -f /usr/local/apache/conf/httpd.conf
-DSSL
nobody     510 28382 40 06:40 ?        00:47:20 /usr/local/apache/bin/httpd -f /usr/local/apache/conf/httpd.conf
-DSSL
nobody    1061 28382 44 06:59 ?        00:43:47 /usr/local/apache/bin/httpd -f /usr/local/apache/conf/httpd.conf
-DSSL
nobody    3778 28382  0 08:25 ?        00:00:02 /usr/local/apache/bin/httpd -f /usr/local/apache/conf/httpd.conf
-DSSL
nobody    3787 28382  0 08:26 ?        00:00:03 /usr/local/apache/bin/httpd -f /usr/local/apache/conf/httpd.conf
-DSSL
nobody    3950 28382  0 08:30 ?        00:00:01 /usr/local/apache/bin/httpd -f /usr/local/apache/conf/httpd.conf
-DSSL
nobody    4316 28382  0 08:37 ?        00:00:00 /usr/local/apache/bin/httpd -f /usr/local/apache/conf/httpd.conf
-DSSL
root     28382     1  0 Jan07 ?        00:00:00 /usr/local/apache/bin/httpd -f /usr/local/apache/conf/httpd.conf
-DSSL
[root@hero root]# ltrace -p28382
time(NULL)                                        = 1073579917
waitpid(-1, 0xbffff9bc, 1, 0x080a8b13, 11)        = 0
select(0, 0, 0, 0, 0xbffff990)                    = 0
time(NULL)                                        = 1073579918
waitpid(-1, 0xbffff9bc, 1, 0x080a8b13, 11)        = 0
select(0, 0, 0, 0, 0xbffff990)                    = 0
time(NULL)                                        = 1073579919
waitpid(-1, 0xbffff9bc, 1, 0x080a8b13, 11)        = 0
select(0, 0, 0, 0, 0xbffff990)                    = 0
time(NULL)                                        = 1073579920
waitpid(-1, 0xbffff9bc, 1, 0x080a8b13, 11)        = 0
select(0, 0, 0, 0, 0xbffff990 <unfinished ...>
[root@hero root]# /etc/init.d/httpd
[root@hero root]# cat /www/logs/error_log
[Thu Jan  8 08:39:47 2004] [warn] child process 510 still did not exit,
sending a SIGTERM
[Thu Jan  8 08:39:47 2004] [warn] child process 26415 still did not exit,
sending a SIGTERM
[Thu Jan  8 08:39:47 2004] [warn] child process 1061 still did not exit,
sending a SIGTERM
[Thu Jan  8 08:39:51 2004] [error] child process 510 still did not exit,
sending a SIGKILL
[Thu Jan  8 08:39:51 2004] [error] child process 26415 still did not exit,
sending a SIGKILL
[Thu Jan  8 08:39:51 2004] [error] child process 1061 still did not exit,
sending a SIGKILL
[Thu Jan  8 08:39:51 2004] [notice] caught SIGTERM, shutting down




There are also many of these problems:

[Tue Jan  6 18:32:34 2004] [notice] child pid 6243 exit signal
Segmentation fault (11)
[Tue Jan  6 18:32:44 2004] [notice] child pid 6242 exit signal
Segmentation fault (11)
[Tue Jan  6 18:41:00 2004] [notice] child pid 6244 exit signal
Segmentation fault (11)
[Tue Jan  6 18:46:04 2004] [notice] child pid 6786 exit signal
Segmentation fault (11)
[Tue Jan  6 19:04:36 2004] [notice] child pid 7035 exit signal
Segmentation fault (11)
[Tue Jan  6 19:04:36 2004] [notice] child pid 6783 exit signal
Segmentation fault (11)
[Tue Jan  6 21:43:22 2004] [notice] child pid 10608 exit signal
Segmentation fault (11)
[Tue Jan  6 21:43:33 2004] [notice] child pid 10553 exit signal
Segmentation fault (11)
[Tue Jan  6 21:44:03 2004] [notice] child pid 10414 exit signal
Segmentation fault (11)
[Tue Jan  6 22:06:24 2004] [notice] child pid 10416 exit signal
Segmentation fault (11)


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Mime
View raw message