httpd-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Gatwood <dgatw...@mac.com>
Subject [users@httpd] Tracking down cause of long server stalls
Date Wed, 16 Feb 2005 22:04:22 GMT

I'm trying to track down a bizarre server stall on Mac OS X.  I've 
reproduced this
on four machines now, configured by different people in different ways, 
one of
which is running Mac OS X Server, the other three client.  The three 
client
systems are all configured with Apache 1.3.33 (stock Mac OS X install). 
  I'm
not sure what's running on the Mac OS X Server machine, as I don't 
administer
that one.

The behavior is this: random requests end up taking an inordinate 
amount of
time... on the order of thirty seconds.

Things I've ruled out:

1.  The hard drive spinning down.
2.  Network packet loss.  This occurs even when connecting from the
same machine Apache is running on.
3.  Paging.  One of the machines has over a gig of RAM.
4.  Something else tying up the CPU.  (That would be obvious when
connecting from the same machine....)

There are some clues, however.  After these stalls, I end up with a 
nastygram
deposited in error_log:

	shell-init: could not get current directory: getcwd: cannot access 
parent directories: Permission denied
	  % Total    % Received % Xferd  Average Speed          Time           
   Curr.
	                                 Dload  Upload Total    Current  Left  
   Speed
	100   512  100   512    0     0    341      0  0:00:01  0:00:01  
0:00:00  2813

though that might be unrelated.

A ktrace on all of the httpd processes looks largely normal except for 
one...
umm... surprise.

It looks like one of the helper processes writes a file out to 
descriptor 3
(presumably to the main server process), then writes a line that 
includes
the IP number to which the data should be sent, along with the request
line and status information, but writes that to fd 17.  Then it does 
this:

   1144 httpd    0.000042 CALL  sigaction(0x1e,0xbffff9a0,0xbffffa10)
   1144 httpd    0.000014 RET   sigaction 0
   1144 httpd    0.000027 CALL  read(0x3,0x817a90,0x1000)
   1144 httpd    16.842855 RET   read -1 errno 4 Interrupted system call
   1144 httpd    0.000020 PSIG  SIGALRM caught handler=0x3de8 mask=0x0 
code=0x0
   1144 httpd    0.000046 CALL  close(0x3)

The third column is the time in seconds relative to the previous entry. 
  Obviously,
a read on a file descriptor should never take 17 seconds to complete.  
This
appears to be an issue of the helper process waiting for some response 
from
the main server process indicating that it has sent out the data.  A 
ktrace
on the main server process shows no unusual stalls, with a fairly 
consistent
wakeup from select every second at most.

Has anyone else seen this behavior, and do you have any idea how to fix 
the
problem?


Thanks,
David


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Mime
View raw message