From (David Robinson)
Subject Re: patch 20 death
Date Fri, 08 Dec 1995 18:27:00 GMT
> ok here goes...

> Apache spawns to death in most cases, possibly filling the error_log too.

Ok, here's what the patch does.
*** http_main.c.orig2	Tue Oct  3 14:30:50 1995
--- http_main.c	Sun Oct  8 18:20:03 1995
*** 739,746 ****
      dupped_csd = -1;
      child_num = child_num_arg;
      requests_this_child = 0;
-     update_child_status (child_num, SERVER_READY);
      reopen_scoreboard (pconf);
      /* Only try to switch if we're running as root */
      if(!getuid() && setuid(user_id) == -1) {
--- 739,746 ----
      dupped_csd = -1;
      child_num = child_num_arg;
      requests_this_child = 0;
      reopen_scoreboard (pconf);
+     update_child_status (child_num, SERVER_READY);
      /* Only try to switch if we're running as root */
      if(!getuid() && setuid(user_id) == -1) {

The bug was that update_child_status was using the file descriptor inherited
from the parent. Thus it changes the file pointer of the parent; there
is a small risk that this could cause the parent to read or write using the
wrong location of the scoreboard file. For example, it could
reads its image of the scoreboard starting halfway-throught, which would
cause greate problems. (But limited to excess spawning of children.)

I do not believe that there is anything wrong with the patch.

There must be a bug elsewhere in Apache. It should be fixed.
I note that the scoreboard code does very little error checking.
The author should be embarrassed. 8-)

A patch is supplied which adds this error checking. Can you _please_
apply this patch to a 'broken' system and see if it catches anything?

> My personal observations (possibly unrelated, but who knows..)
> The machine at Cardiff had MaxClients set to 18, and still the problem
>continued. When I recompiled 1.0.0 without mod_user and mod_negotiation
>the problem disappeared.
>Someone suggested that the symptoms could relate to the scoreboard getting
>nuked, so we found patch 20 to be a candidate.
>People using Sunos, linux, *BSD* (I can't keep up with these names), who
>experienced the problem have reversed patch 20. So far, there are no
>reports that the problem exists after 20 is removed.
>We're sure the problem came in after 0.8.14.

Are you saying that several _unrelated_ changes 'fixed' the problem??
Sounds like a rogue pointer that's sensitive to the data/code layout.


