Message-Id: <3.0.3.32.19970819165925.008784e0@localhost>
Date: Tue, 19 Aug 1997 16:59:25 -0700
To: new-httpd@apache.org, Coar@decus.org
From: Brian Behlendorf <brian@organic.com>
Subject: Re: Keep it simple
In-Reply-To: <97081914155004@decus.org>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: new-httpd-owner@apache.org
Precedence: bulk
Reply-To: new-httpd@apache.org

At 02:15 PM 8/19/97 -0400, Rodent of Unusual Size wrote:
>From the fingers of Brian Behlendorf flowed the following:
>>
>>Regarding enhancing custom logging to do a bunch of conditional things,
>>named formats, replacements for mod_log_agent/referrer, etc:
>>
>>I very strongly feel that this is a slippery slope for us to be walking
>>upon; that everything proposed here can be done completely *outside* the
>>web server software, with a perl script or even a separate binary program
>>we release.
>
>    I can sum up the problem I see with that in one word: CustomLog.  No
>    way can something we provide adapt nicely to all the different
>    formats people have in use.  Not without significant hackery on
>    their part - in which case why bother, if it's not good enough OOTB?

So you either have the separate script understand CustomLog, or you say "to
use this script we recommend you log using the following format...".  If
it's in perl it's trivial to hack it to parse different formats anyways.

>>             Its inclusion in Apache will add code, directives, promises of
>>future support, and complexity, with not much gain over using a separate
>>process.
>
>    I disagree.  I see massive, massive gain potential in terms of being
>    able to control disk activity and space consumption through
>    controlling what gets logged.

With reliable piped logs you can throw out the hits you don't want,
immediately.  With buffered logfile writes you can also lower disk activity.
For high-volume systems where I/O is an issue, most folks are logging to a
different disk or disk interface anyways.

>    I'm sure there are people who really only want to log accesses to
>    their stuff from people *outside* their department (or whatever),
>    but they have to wade through or otherwise discard the mass of stuff
>    from *inside* in order to find them.

  next if $ip !~ /insideregexp/

How is this so much harder than whatever rule you'd put in an Apache config
file?

>    Being able to log by status to a separate file - such as a pipe to a
>    notifier - would be really cool, too (an outgrowth of Rob's idea).

  if ($status eq '404') {
	print NOTIFIER "404 error: $url\n   referenced from $referrer\n"
  }

My point: any language you create to express the above in a config file
could also be used to express it in a perl script.  Having it embedded in
the server adds very little.

>>           What more do we have to gain by putting that logic directly in
>>Apache?
>
>    Performance, flexibility.  

Dubious.  Performance is hindered because it's extra work the Apache child
has to do before they can handle the next request.  Flexibility is hindered
because it has to be done using either whatever "primitives" we provide in
the config language (using awkward commands like DoNotLogIf or something)
or by hacking direct C code in mod_log_config.  We don't have a real
programming language in our config files, so we'll keep on adding obtuse
primitives for logging conditions and features until we've reimplemented a
parallel to perl.

>So far, the conceptual +1s seem to be
>    from Ken, Rob, Marc, Randy, Paul, and Dean - and I think most of
>    those are based on the flexibility issue rather than the performance
>    one.  (Opinions about implementation differ more widely. ;-)

I think the consensus has always been, if there's an idea someone has and
is willing to champion it through to implementation, then it gets in.  I
don't think that's /always/ a healthy way to go.  I'm sure folks would use
these extra logging features; I probably even would; but that doesn't mean
it's the best way to accomplish the goal of better logfile handling.

I've spent a lot of time recently thinking around a free sendmail
replacement called qmail.  Qmail's philosophy is that of lots of little
cooperating processes working together, all very lightweight and
single-purpose, with different UID's and permissions for different
purposes.  The "qmail way" to add features is to plug in a script or
program into one of the many different ways these programs communicate with
each other.  All of this, instead of putting all the functionality into a
monolithic binary, like sendmail.  The more I look at qmail's model the
more I like it.  

Web serving is a time-critical situation.  Everything in the server should
be focused around getting a response to a request as quickly as possible,
and moving onto the next request as quickly as possible.  Log file parsing,
cleansing, and analysis is not time-critical to the request/response
environment; thus I feel it should be done with a separate program.  

We needed configurable logging to be able to expand the amount of
information we could capture from the server; no doubt about that.  

	Brian


--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
"Why not?" - TL           brian@organic.com - hyperreal.org - apache.org