cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Valentin Richter <Valentin.Rich...@raytion.com>
Subject Re: web server log file parsing
Date Sun, 21 Jul 2002 17:15:12 GMT
At 21:59 Uhr +0200 19.07.2002, Bert Van Kets wrote:
>One of the projects I'm working on is using Cocoon to create a web server log analysis
tool.  It's perfect for generating the graphs and reports in different formats.
>The one thing I'm struggling with is parsing the log files.  There are several ways of
doing this:
>1. reading the logs for every period request and create a sax stream
>2. parse the unread log files into a database and use that to generate a sax stream. 
This is a two stage system.
>
>As log files can grow very large copying it to the database will give me not only a large
log directory, but also a large database.  Once the data is in the database the speed advantages
are of course obvious.
>Reading the data from the log files on every request does have a disadvantage in  processing
speed, but does not use gigs of disk space.  Webtrends does it this way exactly for the same
reason.
>
>What is your guys opinion?
>
>Bert

Just to suggest an alternative approach: If your main concern is not developing another log
analysis tool but to have nicely formated reports and using Cocoon to do the necessary work
then you might use Analog for analysing the log files and feed its output to Cocoon.

Analog

  http://www.analog.cx/

has a special parameter for producing its results in a special computer friendly format, i.e.

   OUTPUT COMPUTER
 
which produces a text file with comma or tab separated values. See

   http://www.analog.cx/docs/output.html

and

   http://www.analog.cx/docs/compout.html

for details. Unfortunately, there seems to be no XML output option available.

Although Analog is blazingly fast (if you disable DNS lookups) I would still recommend not
to run it for each request. One strategy which works for us, is to rotate the logfiles daily
at midnight, compress them using gzip, run Analog on the compressed files (which is by the
way by a factor of 2 to 4 _faster_ than with uncompressed files, probably because it needs
only 10th of the number of disc accesses) and put the results into a database.

Valentin Richter

Raytion GmbH
Kaiser-Friedrich-Ring 74
40547 Düsseldorf
Germany
mailto:valentin.richter@raytion.com
http://www.raytion.com


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Mime
View raw message