httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: Question: error checking: how important? [PATCH]
Date Sat, 06 Mar 1999 17:16:41 GMT


John Bley writes:
> On Sat, 6 Mar 1999 wrote:
> > Exactly how to log is also a problem.  This routine takes no
> > dynamic context info so you can't record the request or the
> > server that was deluded into thinking the directory in question
> > was useful.
> Well, roughly 18% of the ap_log_error calls in main/ are also lacking 
> this information...

Yeah.  It is also what make the /ap directory dependent on the /main
directory - i.e. http_log is used by routines in the /ap directory (which
I think is bogus).

> even if a log message can't provide specific 
> information along these lines, it will have a log context, and this at 
> least leaves some clues ("always crashing when a lynx user requests this CGI"
> or something...)

Logging is good, but there is a place in the layering of the "design"
for it.  I personally - these days - think it comes later than at the
OS API facade.

> > Few programs include code to handle errors that if they occur
> > the entire enterprise is a hopeless mess.  So for example the
> > calls on dup2 fall into that catagory.  Programs willing to
> > call longjmp at the error site sometimes do, but they are rare.
> Excellent point - in some cases, the process won't be able to log the 
> error at all, and as Jim pointed out, in other cases in makes more sense 
> to do something other than log - try again or switch tactics.  But if the 
> dup2 situation is hopeless, why does the WIN32 code log for it but the 
> Unix code doesn't?

Like I say these trade offs are approprately left to the judgement of
the author (always open to revisiting of course).  The Win32 authors
are a more paranoid lot since we don't trust that platform not to
surprise us.

> I was somewhat appalled...

 Rains: "I'm shocked, shocked to find that gambling is going on here!"
 croupier: "Your winnings, sir."

Very deep those lines.  Successful systems involve just the right
degree of gambling.  I can not too highly recomend reading Richard
Gabriel's essay "Worse is Better" on this topic.

> > So I'm sure we would love to see cases where you think that
> > the judgement call made was deluded!
> > One at a time.
> Here's one.  138 more to look at.
> diff -Burp apache-1.3/src/main/http_core.c apache-1.3-patched/src/main/http_core.c
> --- apache-1.3/src/main/http_core.c	Wed Feb 24 09:12:26 1999
> +++ apache-1.3-patched/src/main/http_core.c	Sat Mar  6 11:10:16 1999
> @@ -2937,7 +2937,10 @@ static void mmap_cleanup(void *mmv)
>  {
>      struct mmap *mmd = mmv;
> -    munmap(mmd->mm, mmd->length);
> +    if(munmap(mmd->mm, mmd->length))
> +	ap_log_error(APLOG_MARK, APLOG_ERR, NULL, 
> +		"Couldn't munmap memory of length %d at 0x%x", 
> +		mmd->length, mmd->mm);
>  }
>  #endif

Great example.  This is a static routine used only once.  It would be a foul up
of truely amazing proportions if the logic of the server was so deeply troubled
that this error arises.  Meanwhile a few sad things happen when we add the code
to handle the error.
 1. There are many more tokens to for me to read and understand
    everytime I visit this routine.
 2. Many more chances for errors.  Was errno set, does the
    printf string match up, are the signed/unsigned types matching
    up, are they compatible on all N platforms.
 3. There are doc issues to consider, since a "real product" 
    (you know the ones that never ship) would enumerate all it's
     error messages 
 4. Code that never fires is a real problem in systems.
 5. New coders presume this code fires and they allocate space
    in their heads for a "what is that for" task.

I'd argue that the judgement of the author was exactly right in
this case.

In the extremely unlikely case that all 138 cases are like this
one we would be creating over a gross of these tangles for implementors,
testers, and documenters to contemplate.  Each time they would be
peeved to discover that in actuallity the case just doesn't arise.

All code is full of "error cases" the author has chosen to ignore,
particularly in C for at least two reasons arithmetic is so amusing, 
and NULL pointers are so common.

If you really want to be miserable consider how all the Winsock
API routines return unsigned ints for the sockets which we then
immediately slam into signed ints, and then check against 
SOCKET_ERROR which just happens to be what INVALID_SOCKET transforms
into when slammed into a signed int.  It's clear to me that is
_exactly_ what the winsock designers intended we should do.

Like I say the trick is finding the ones that a bug or generate a very
interesting discussion.  I'd be surprised if at least half of those 138
cases can do at least the interesting discussion part.

Thanks for contributing.

 - ben

View raw message