httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From r..@ai.mit.edu (Robert S. Thau)
Subject Re: Configurable logging formats... EXPERIMENTAL module.
Date Sun, 30 Jul 1995 11:32:19 GMT
   Date: Sat, 29 Jul 1995 22:57:53 -0700 (PDT)
   From: Brian Behlendorf <brian@organic.com>

   Testing it now on port 8001 on hyperreal (conf file 
   /usr/local/www.tools/apache/conf/httpd.conf.s for those interested).  
   Apparently there's a bug in that an extra copy of every request
   gets logged to SERVER_ROOT/logs/access_log, for both virtual and 
   non-virtual hosts.

Are you sure you remembered to take mod_log_common out?  If you don't,
it will still be there, but it won't be configured (mod_log_config
gets all the TransferLog commands), so you'll get its default
behavior, which is as described.

   > Anyway, the LogFormat directive is along the same basic lines I
   > remember people talking about; the LogFormat which reproduces CLF is
   > 
   >   LogFormat "%h %l %u %t \"%r\" %s %b"
   >   
   > (those being host, logname, user, time, request, status, bytes-sent).

   There's just a few more I'd like to add:

   %U for unix time format

Ummm... given %u for user, it would probably be better for %U to be
URI.  (Then, with the current code, %<U would be URI requested, as
modified by MultiViews, and %>U would be URI delivered --- the '<' and
'>' select original or final request, and are in the code, but not
written up; %U itself would be trivial to add (<5 lines)).

As for this, it might be better to have a %{...}t option, where the
{...}, if present, is a time format string.  This isn't very hard, and
it does avoid consuming the (one-character) namespace.

   %a for actual object retreived when it differs from the object requested 
	   - the heuristics for this need defining I suppose.

See above for my thoughts on this.

   %T for total time to deliver file, in milliseconds (okay, I'm dreaming, 
	   shoot me)

This would require a record of arrival time in the request structure
(not the connection --- think keep-alives and HTTP-NG), so I'd rather
not do it for this release.  For the public release *after* next, this
is easy and probably worthwhile.

   Hmm.  The rest seem to be covered by your %i and %o (brilliant!)

They cover a lot of things.  Less code to write...

   > In addition to these, you can ask for %{Foobar}i and %{Foobar}o, to
   > get at the contents of some request or response MIME header,
   > respectively (e.g., you can use %{Referer}i to get the Referer).
   > Also, you can conditionalize the appearance of certain fields by HTTP
   > status code; for instance, '%!200,304,302{Referer}i' logs Referer only
   > on requests which got some sort of nontrivial error, and a '-'
   > otherwise to keep parsing sane.  (This might be useful if your main
   > use for Referer: is tracking down pages with bogus links to your site,
   > and you don't want it taking up space otherwise --- in fact,
   > %404{Referer}i might be what some people really want; file not found
   > *only*).

   I like this, but there is one more condition I'd like to be able to test for
   - file type.  In our custom log_common we log referrers for every non-image
   access, to save space and ease readability.  If this is hard to do 
   efficiently it's not a big deal to me, I'll just hack around it, but it 
   might be something others want.

Geez.  I thought people might want to conditionalize on originating
host (I think NetSite allows this), but file type is a new one on me.
It's reasonable, though.  To make it work, I think we'd really need a
worked-out syntax for multiple types of conditions ---

   %[Status=404]{Referer}i
   %[Status!=200,304,302]{Referer}i
   %[Content-type!~image/]{Referer}i

...and then we'd need to worry about syntaxes (and semantics) for
combinations --- AND about dealing with LogFormat directives which
are large enough to be most conveniently put on multiple lines.
(The cmd_parms structure does allow a command handler to read
additional lines out of the config file, if it *really* wants to;
<LogFormat> sections, anyone?)

   Oh, and we need to bring up the question of escaping again.... sigh.

Yep.  There is no auto-escaping on anything in the current version;
all delimiters have to be literally present in the LogFormat string.
I take it you noticed what I said in the block comment up top of the
code on this issue, but for those who haven't looked yet:

 *      Note that
 * there is no escaping performed on the strings from %r, %...i and
 * %...o; some with long memories may remember that I thought this was
 * a bad idea, once upon a time, and I'm still not comfortable with
 * it, but it is difficult to see how to "do the right thing" with all
 * of '%..i', unless we URL-escape everything and break with CLF.

FWIW, making %@x URL-escape the result of %x, for all x, would again
be just a few lines of code...

rst

Mime
View raw message