tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier>
Subject Re: Some problem of analyzing the tomcat logs
Date Fri, 17 Sep 2010 07:17:27 GMT

In short and in my opinion, I think that you are re-inventing the wheel.
There exist already numerous open-source programs which analyse web logs, and generally 
produce nice-looking graphics etc.. from them.  And they do the splitting-up work 
properly, as long as you feed them the correct log format.  Their documentation indicates

how to do that.
Look up webalizer, awstats etc..
Also, these programs are open-source, so you can look inside at how they do things, if you

really want to write your own code.

yang Yang wrote:
> Hi:
> I am trying to develop a web based tool to track page hit counts, user
> session activity and etc of our own sites.
> I meet some problems:
> 1) How to distinguish a request target is a page or a resource?
> For example,the following two logs(remove some parts):
> #1-> [17/Sep/2010:11:38:26 +0800] "POST /test.jsp?name=test HTTP/1.1" 200
> "test.jsp"
> #2-> [17/Sep/2010:11:40:11 +0800] "POST /example/test.jpg HTTP/1.1" 200
> "/example/test.jpg"
> #3-> [17/Sep/2010:11:44:26 +0800] "POST /example/testServlet HTTP/1.1" 200
> "test.jsp"
> the pattern used in the above log is : '%t "%r" %s "%U"'.
> The log #1 show a page request with a parameter, it can be use to calculate
> the most frequently visited pages.
> Log #2 show a resource(it is a image here) request, it can be used to
> calculate the most frequently visited files.
> Log#3 show a requst with nothing(it is a servlet),in fact it is a page.
> That's to say, they are different request types,so how to distinguish them
> in my codes?
> 2)Log parser.
> I can read the log file line by line. But how to extract the value of each
> attribute?
> They are all in one line. Split them using the string.split() method? But
> how if the value itself contains the separator?
> For example, I use the split(" ") to split the log#1,but the value "POST
> /example/test.jpg HTTP/1.1" will be splitted also,and this maybe
> inefficient, so I wonder if there is a tool can make me do this easily?

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message