httpd-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Terry Kennedy <TE...@tmk.com>
Subject Re: [users@httpd] LogFormat Combined - many logfile lines with no Referer or User-agent
Date Fri, 29 Jul 2011 04:15:43 GMT
Rich Bowen wrote:

> These are optional fields which *may* be passed by a user agent. When they
> are passed, they are not reliable - that is, they may be spoofed, trivially.

  Understood. I'm not depending on them for any decision-making.

  The issue is that Analog discards those lines, so (for example) requests
logged for a particular file (which are missing those two fields) are dis-
carded and not counted for purpose of things like "top 25 requested files".

  Also, they're completely absent, despite the escaped "s in the LogFormat
directive which should generate either "" "" or "-" "-" when the fields are
missing.

> It would be interesting to see what version of what browser released in the 
> last 30 days.

  Most of the clients accessing the site in question are using ancient
browsers - in one case where I investigated fully, the client PC is running
Windows 2000 and IE 6. Some of its accesses had the Referer and User-Agent
logged, while others had them missing.

  One system where I have logs going back 2+ years shows a number of entries
with missing fields at a reasonably constant rate (200 to 5000 per month),
with no big jump. Oddly, that's the system where I'd expect new client ver-
sions (like Firefox 5) to show up, yet the number of logged lines where the
fields are missing remains relatively constant.

  It seems that either both fields are properly present, or both are missing.
I was unable to locate any log lines which had either a Referer or "-" but
which were missing the User-Agent field.

> Oh. Hmm. That's interesting. What I would look for, in that case, is more
> than one LogFormat directive logging to the same location.

  I thought of that and checked it previously. However, I just checked it
again (Apache 2.0.63 system):

(0:12) www:/usr/local/etc/apache2# grep CustomLog *
httpd.conf:# a CustomLog directive (see below).
httpd.conf:#CustomLog /var/log/httpd-access.log common
httpd.conf:#CustomLog /var/log/httpd-referer.log referer
httpd.conf:#CustomLog /var/log/httpd-agent.log agent
httpd.conf:CustomLog /var/log/httpd-access.log combined
httpd.conf:#    CustomLog /var/log/dummy-host.example.com-access_log common
httpd.conf:CustomLog /var/log/httpd-deflate.log deflate
ssl.conf:CustomLog /var/log/httpd-ssl_request.log \
ssl.conf_orig:CustomLog /var/log/httpd-ssl_request.log \

  I only see 3 uncommented CustomLog directives, one for a combined log,
a separate one that logs deflate info, and a third one for SSL requests.

  There also isn't any discernable pattern to the entries with the missing
fields - some CGI requests are logged with them, some without. Same for PHP.
Some are for 404's, some are for successful file access.

  I'm baffled. I wonder if anyone else is having the same issue, but didn't
notice it. For example, Analog will only complain about "Large number of 
corrupt lines in logfile" if they exceed a certain percentage threshold of
the total number of lines in the log file.

  The following (disgusting, I really should use awk) command string should
report the total number of lines missing the Referer and User-Agent fields
in a combined-format logfile, at least if the default timestamp format is
used:

cut -d \] -f 2-99 /var/log/httpd-access.log | cut -d \" -f 3-99 | cut -d " " -f 4-99 | grep
^$ | wc -l

  Anybody want to try it? (Of course, satisfy yourself that it can't do
anything evil first).

  On two of my production systems running 2.0.63:

(0:23) www:/tmp# cut -d \] -f 2-99 /var/log/httpd-access.log | cut -d \" -f 3-99 | cut -d
" " -f 4-99 | grep ^$ | wc -l
  743308
(0:24) www:/tmp# wc -l /var/log/httpd-access.log
 4802394 /var/log/httpd-access.log

(0:175) gate:/tmp# cut -d \] -f 2-99 /var/log/httpd-access.log | cut -d \" -f 3-99 | cut -d
" " -f 4-99 | grep ^$ | wc -l
   99583
(0:176) gate:/tmp# wc -l /var/log/httpd-access.log
 3658733 /var/log/httpd-access.log

  On a 2.2.19 test system I just brought up:

(0:36) test:/tmp# cut -d \] -f 2-99 /var/log/httpd-access.log | cut -d \" -f 3-99 | cut -d
" " -f 4-99 | grep ^$ | wc -l
     433
(0:37) test:/tmp# wc -l /var/log/httpd-access.log 
    1321 /var/log/httpd-access.log

  The test system is particularly interesting as I did NOT copy the Apache
configuration files from a production system - I configured it by editing
the default config files. So this shouldn't be a cut-and-paste error.

        Terry Kennedy             http://www.tmk.com
        terry@tmk.com             New York, NY USA

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Mime
View raw message