httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexei Kosut <ako...@organic.com>
Subject fcntl() errors on Solaris
Date Wed, 13 Aug 1997 00:58:50 GMT
I've been seeing this pretty consistently over the past few weeks: Since
I'm use Apache mostly for development (of Apache and related bits), I
start and stop Apache a lot; dozens of times an hour. After a five or six
hours, I always manage to get it so Apache no longer works right; it only
forks one or two children, instead of the five I have set, and I get tons
of these in my error log:

[Tue Aug 12 17:41:29 1997] fcntl: F_SETLKW: No record locks available
[Tue Aug 12 17:41:29 1997] - Error getting accept lock. Exiting!

I'm using Solaris 2.5.1, and I'm presuming this is because Apache uses
fcntl for its accept_mutex to lock a file, and then exists (see my
earlier message about how sig_term() works, and why it sucks) without
unlocking the file. All those locked files add up, I guess.

If I leave it overnight, and come back, it works again for a while.

Oh, and the files are being locked over NFS. That possibly has something
to do with it.

Still, Apache should make sure it unlocks its accept mutex before it
quits. I think the children do the right thing wrt mutexes when they are
restarted, though I'm not sure. I'm sure, however, that a shutdown
(SIGTERM) does the wrong thing.

In fact, when the Apache parent gets a SIGTERM, it should do the following
(IMHO) or something similar (instead of just killpg(SIGKILL) and exit):

1. Set a shutdown_pending, like SIGHUP sets a restart_pending. By not
   exiting directly, you let alarms and things work correctly.
2. standalone_main then checks shutdown_pending at the same time
   it checks restart_pending.
3. Act similarly to a non-graceful restart: do a killpg(SIGTERM) (this
   will shut down the children correctly, allowing child_exit to be
   called and pool cleanups to be done.
4. call destroy_pool(), so any cleanups registered for the main server
   are done (this might include stuff besides freeing memory -
   disconnecting from a database or shutting down a compainion
   process. Whatever.)
5. Maybe wait a few seconds, and do a killpg(SIGKILL), just to make
   sure.
6. Now exit.

Yes, it takes a bit longer, but I think this is important to making sure
Apache does the right thing. And as I've said, I will veto any release
that includes a child_exit API phase that doesn't work (which includes
the current 1.3a2-dev).

If the above sequence of events sounds good, I can make a patch. Or
someone else can.

-- Alexei Kosut <akosut@organic.com>


Mime
View raw message