httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul A. Houle" <p...@cornell.edu>
Subject Re: Do these broken clients still exist?
Date Mon, 04 Apr 2005 13:46:12 GMT
On Sun, 3 Apr 2005 13:58:56 -0400 (Eastern Daylight Time), Joshua Slive  
<joshua@slive.ca> wrote:

> Does someone with a high-traffic, general-interest web site want to take  
> a look through their logs for these user-agent strings.  I don't mind  
> keeping them if they make up even 1/100 of a percent of the trafic, but  
> it seems silly to keep these extra regexes on every single request if  
> these clients don't exist anymore in the wild.
>
>

	Regexes are pretty cheap for a 'normal' apache setup.

	In the initial testing of a production server (2x 3.2Ghz Xeon,  6 GB  
RAM;)  we found that,  serving static pages,  the overhead of processing  
regexes didn't become noticable until we had >1000 rewriting rules.  Even  
then,  at least 30% of the hits on this server are cgi-scripts,  so the  
overhead of regexes is really nothing compared to the other ways we abuse  
our machine.

	In doing this testing I did notice that Apache's handling of regexes is  
pretty simplistic.  Much of the time you can consolidate a large stack of  
regexes into a single state machine,  and that could give vast (factors of  
hundreds or thousands) improvements in performance for handling large rule  
sets.  On the other hand,  it doesn't really matter.

	The people we've inherited this server from left us several very large  
regexes with a few hundred pipe symbols each that match UA's of  
non-browser clients that we don't want using our service.  The trouble is  
that inevitably this kind of regex starts mutating into malignant forms as  
people start using parens,  also we have no documentation for the rules;   
on slow days I think about breaking these up into 500-1000 rules,  which  
we could in principle comment one-by-one...  This wouldn't really impact  
the performance of our machine under 'real' circumstances,  but we could  
measure the impact under specialized testing.

Mime
View raw message