Return-Path: owner-new-httpd Received: by taz.hyperreal.com (8.6.12/8.6.5) id IAA29470; Sun, 30 Jul 1995 08:32:32 -0700 Received: from life.ai.mit.edu by taz.hyperreal.com (8.6.12/8.6.5) with SMTP id IAA29459; Sun, 30 Jul 1995 08:32:29 -0700 Received: from volterra (volterra.ai.mit.edu) by life.ai.mit.edu (4.1/AI-4.10) for new-httpd@hyperreal.com id AA00642; Sun, 30 Jul 95 11:32:22 EDT From: rst@ai.mit.edu (Robert S. Thau) Received: by volterra (4.1/AI-4.10) id AA23415; Sun, 30 Jul 95 11:32:19 EDT Date: Sun, 30 Jul 95 11:32:19 EDT Message-Id: <9507301532.AA23415@volterra> To: new-httpd@hyperreal.com Cc: new-httpd@hyperreal.com, new-httpd@mail.apache.org In-Reply-To: (message from Brian Behlendorf on Sat, 29 Jul 1995 22:57:53 -0700 (PDT)) Subject: Re: Configurable logging formats... EXPERIMENTAL module. Sender: owner-new-httpd@apache.org Precedence: bulk Reply-To: new-httpd@apache.org Date: Sat, 29 Jul 1995 22:57:53 -0700 (PDT) From: Brian Behlendorf Testing it now on port 8001 on hyperreal (conf file /usr/local/www.tools/apache/conf/httpd.conf.s for those interested). Apparently there's a bug in that an extra copy of every request gets logged to SERVER_ROOT/logs/access_log, for both virtual and non-virtual hosts. Are you sure you remembered to take mod_log_common out? If you don't, it will still be there, but it won't be configured (mod_log_config gets all the TransferLog commands), so you'll get its default behavior, which is as described. > Anyway, the LogFormat directive is along the same basic lines I > remember people talking about; the LogFormat which reproduces CLF is > > LogFormat "%h %l %u %t \"%r\" %s %b" > > (those being host, logname, user, time, request, status, bytes-sent). There's just a few more I'd like to add: %U for unix time format Ummm... given %u for user, it would probably be better for %U to be URI. (Then, with the current code, %U would be URI delivered --- the '<' and '>' select original or final request, and are in the code, but not written up; %U itself would be trivial to add (<5 lines)). As for this, it might be better to have a %{...}t option, where the {...}, if present, is a time format string. This isn't very hard, and it does avoid consuming the (one-character) namespace. %a for actual object retreived when it differs from the object requested - the heuristics for this need defining I suppose. See above for my thoughts on this. %T for total time to deliver file, in milliseconds (okay, I'm dreaming, shoot me) This would require a record of arrival time in the request structure (not the connection --- think keep-alives and HTTP-NG), so I'd rather not do it for this release. For the public release *after* next, this is easy and probably worthwhile. Hmm. The rest seem to be covered by your %i and %o (brilliant!) They cover a lot of things. Less code to write... > In addition to these, you can ask for %{Foobar}i and %{Foobar}o, to > get at the contents of some request or response MIME header, > respectively (e.g., you can use %{Referer}i to get the Referer). > Also, you can conditionalize the appearance of certain fields by HTTP > status code; for instance, '%!200,304,302{Referer}i' logs Referer only > on requests which got some sort of nontrivial error, and a '-' > otherwise to keep parsing sane. (This might be useful if your main > use for Referer: is tracking down pages with bogus links to your site, > and you don't want it taking up space otherwise --- in fact, > %404{Referer}i might be what some people really want; file not found > *only*). I like this, but there is one more condition I'd like to be able to test for - file type. In our custom log_common we log referrers for every non-image access, to save space and ease readability. If this is hard to do efficiently it's not a big deal to me, I'll just hack around it, but it might be something others want. Geez. I thought people might want to conditionalize on originating host (I think NetSite allows this), but file type is a new one on me. It's reasonable, though. To make it work, I think we'd really need a worked-out syntax for multiple types of conditions --- %[Status=404]{Referer}i %[Status!=200,304,302]{Referer}i %[Content-type!~image/]{Referer}i ...and then we'd need to worry about syntaxes (and semantics) for combinations --- AND about dealing with LogFormat directives which are large enough to be most conveniently put on multiple lines. (The cmd_parms structure does allow a command handler to read additional lines out of the config file, if it *really* wants to; sections, anyone?) Oh, and we need to bring up the question of escaping again.... sigh. Yep. There is no auto-escaping on anything in the current version; all delimiters have to be literally present in the LogFormat string. I take it you noticed what I said in the block comment up top of the code on this issue, but for those who haven't looked yet: * Note that * there is no escaping performed on the strings from %r, %...i and * %...o; some with long memories may remember that I thought this was * a bad idea, once upon a time, and I'm still not comfortable with * it, but it is difficult to see how to "do the right thing" with all * of '%..i', unless we URL-escape everything and break with CLF. FWIW, making %@x URL-escape the result of %x, for all x, would again be just a few lines of code... rst