httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul A. Houle" <p...@cornell.edu>
Subject Re: stress testing of Apache server
Date Wed, 04 May 2005 15:30:39 GMT
On Tue, 03 May 2005 13:51:55 -0700, Paul Querna <chip@force-elite.com>  
wrote:

> Sergey Ten wrote:
>> Hello all,
>>
>> SourceLabs is developing a set of tests (and appropriate workload data)  
>> to
>> perform stress testing of an Apache server using requests for static  
>> HTML
>> pages only. We are interested in getting feedback on our plans from the
>> Apache server community, which has a lot of experience in developing,
>> testing and using the Apache server.
>>

	Although Apache is hardly the fastest web server,  it's fast enough at  
serving static pages that there are only about 1000 sites in the world  
that would be concerned with it's performance in that area...

	Ok,  there's one area where I've had trouble with Apache performance,   
and that's in serving very big files.  If you've got a lot of people  
downloading 100 MB files via dialup connections,  the process count can  
get uncomfortably high.  I've tried a number of the 'single process' web  
servers like thttpd and boa,  and generally found they've been too glitchy  
for production work -- a lot of that may involve spooky problems like  
sendfile() misbehavior on Linux.


>> Information available on the Internet, as well as our own experiments,  
>> make
>> it clear that stressing a web server with requests for static HTML pages
>> requires special care to avoid situations when either network bandwidth  
>> or
>> disk IO become a limiting factor. Thus simply increasing the number of
>> clients (http requests sent) alone is not the appropriate way to stress  
>> the
>> server. We think that use of a special workload data (including  
>> httpd.conf
>> and .htaccess files) will help to execute more code, and as a result,  
>> better
>> stress the server.

       If you've got a big working set,  you're in trouble -- you might be  
able to get a factor of two by software tweaking,  but the answers are:

(i) 64-bit (or PAE) system w/ lots of RAM.
(ii) good storage system:  Ultra320 or Fibre Channel.  Think seriously  
about your RAID configuration.

       Under most circumstances,  it's not difficult to get Apache to  
saturate the Ethernet connection,  so network configuration turns out to  
be quite important.  We've had a Linux system that's been through a lot of  
changes,  and usually when we changed something,  the GigE would revert to  
half duplex mode.  We ended up writing a script that checks that the GigE  
is in the right state after boot completes and beeps my cell phone if it  
isn't.

==================

	Whenever we commission a new server we do some testing on the machine to  
get some idea of what it's capable of.  I don't put a lot of effort into  
'realistic' testing,  but rather do some simple work with ApacheBench.   
Often the answers are pretty rediculous:  for instance,  we've got a site  
that ranks around 30,000 in Alexa that does maybe 10 hits per second at  
peak times...  We've clocked it doing 4000+ static hits per second w/  
small files,  fewer hits per second for big files because we were  
saturating the GigE.

	What was useful,  however,  was quantifying the performance effects of  
configuration changes.  For instance,  the Apache documentation warns that  
"ExtendedStatus On" hurts performance.  A little testing showed the effect  
was minor enough that we don't need to worry about it with our workload.

	Similarly,  we found we could put ~1000 rewriting rules in the httpd.conf  
file w/o really impacting our system perfomance.  We found that simple PHP  
scripts ran about 10x faster than our CGI's,  and that static pages are  
about 10x faster than that.

	We've found tactical microbenchmarking quite useful at resolving our pub  
table arguments about engineering decisions that effect Apache performance.

	Personally,  I'd love to see a series of microbenchmarks that address  
issues like

* Solaris/SPARC vs. Linux/x86 vs. Mac OS X/PPC w/ different MPMs
* Windows vs Linux on the same hardware
* configuration in .htaccess vs. httpd.conf
* working set smaller/larger than RAM
* cgi vs. fastcgi vs. mod-perl
* SATA vs. Ultra320 SCSI for big working sets

	and so on...  It would be nice to have an "Apache tweakers guide" that  
would give people the big picture of what affects Apache performance under  
a wide range of conditions -- really I don't need precise numbers,  but a  
feel of around 0.5 orders of magnitude or so.

	It would be nice to have a well-organized website with canned numbers,   
plus tools so I can do these benchmarks easily on my own systems.

===============

	Speaking of performance,  the most frustrating area I've dealt with is  
performance of reverse DNS lookups.  This is another area where the Apache  
manual is less than helpful -- it tells you to "not do it" rather than  
give constructive help in solving problems.

	We had a server that had heisenbug problems running RHEL 3,  things  
stabilized with a 2.6 mainline kernel -- in the process of dealing with  
those problems,  we developed diagnostic tools that picked up glitches in  
our system that people wouldn't really notice during operations.  (People  
expect 'the internet' to be a little glitchy,  so we don't get howls when  
the system is sporadically unavailable for a minute.)

	We found out our system was 'seizing up' and becoming unavailable for  
about two minutes every three hours because our DNS provider reloads the  
tables on our DNS servers around that time.  We found out that nscd,  with  
out of the box settings for RHEL 3,  was making the problem worse,   
because it was set to use 5 threads -- resolving 100 or so unique  
addresses a minute,  it's not hard to block 5 threads.

	Problems like this are obscure,  and it would be nice to seem them talked  
about in an "Apache tweakers guide"

Mime
View raw message