httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Erenkrantz <jus...@erenkrantz.com>
Subject Re: Strange Behavior of Apache 2.0.43 on SPARC MP system
Date Wed, 12 Feb 2003 18:35:18 GMT
--On Wednesday, February 12, 2003 11:52 AM -0600 Min Xu 
<mxu@cae.wisc.edu> wrote:

> First, I don't think the disk should be bottleneck in any case,
> this is because the system has 2GB memory, Solaris's file cache is
> able to cache all the file content. top shows the following stats:

The size of memory has nothing to do with the available bandwidth 
that the memory has.  I believe recent Sparcs still only use 133MHz 
RAM (PC133 at best - Sparc's don't yet use DDR, I think).  Since all 
code pages will most likely not fit entirely in CPU cache, some 
section has to be read from main memory.  IIRC, some versions of 
UIIIi's have 4MB CPU cache, but I wouldn't be surprised if that's not 
enough (kernel pages would also have to be counted).  (I don't know 
your specifics here.)

So, if you have 14 processors (I think this is what you said you 
had), they will all be contending on the memory (~133MHz) bus.  The 
effective memory bus for all of the processors will be 133/14 = 
~14MHz.  That's a severe bottleneck if main memory is accessed in a 
critical path for all processes.

All MP Sparcs share the same memory backplane.  That's why you hardly 
ever see performance improvements past 8x CPUs because the memory 
bandwidth kills you (the CPUs are starved for memory).  Moving to a 
NUMA architecture might help, but I think that's not a feature 
UltraSparc or Solaris support.  (I hear Linux has experimental NUMA 
support now.)

I'd recommend reading http://www.sunperf.com/perfmontools.html.  You 
should also experiment with mod_mem_cache and mod_disk_cache.

> To test the context switching hypothesis and the backplane
> hypothesis I changed all files in the repository to 2 bytes long,
> that's an "a" plus an "eof". I rerun the experiment, the
> performance is poorer!

There will still be overhead in the OS networking layer.  You are 
using connection keep-alives and pipelining, right?  The fact that 
your top output had a lot of kernel time, I'd bet you are spending a 
lot of time contending on the virtual network (which is usually the 
case when you are not using connection keep-alives - the TCP stack 
just gets hammered).  I'd bet the local network is not optimized for 
performance.  (DMA can't be used and functionality that could be 
implemented on dedicated hardware must be done on the main CPU.)

Please stop trying to convince us to pay attention to benchmarks 
where the client and server are on the same machine.  There are just 
too many variables that will screw things up.  The performance 
characteristics change dramatically when they are physically separate 
boxes.  -- justin

Mime
View raw message