httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dean Gaudet <>
Subject Re: ack re: [PATCH] performance improvement
Date Sun, 02 Feb 1997 05:14:58 GMT
On Sat, 1 Feb 1997, Marc Slemko wrote:
> Trouble is we don't have this in main memory to jump through anyway.

Well the compiler textbook answer (i.e. dragon book) to this problem is a
buffer pair with sentinels (the sentinel reduces the comparisons in the
inner-loop).  But mmap() or fread() of the entire file is way preferable
in this case. 

> Are you suggesting seeking back and forth through the file?  You still
> have to copy it into memory, may as well do the copy and compare at the
> same time.

I wasn't picturing seeking, I just didn't state that I expected sufficient
chunks of the input to be in memory. 

memcpy is typically implemented using 32-bit (or 64-bit) copies plus
round-off to deal with straggling bytes.  It's a lot faster to copy
without comparing.


P.S.  Um, you wouldn't believe the amount of energy that went into an
optimized memcpy/memset for the Pentium when WATCOM first got access to a
simulator years ago... Intel had this set of benchmarks that vendors had
to do "well" on before their compilers could be stamped "official" or
something like that.  One of them was a really poor sieve that was small
enough that the initial for() loop to zero the sieve was significant in
the profile.  Our compiler would detect the loop (a general case of
similar loops) and mutate it into a memset().  Our memset() used every
last trick we could squeeze out of Appendix H plus a few more the
simulator hinted at... resulting in something that can quite easily
overrun the memory bandwidth available.  At any rate the moral of this
is... use memcpy() et al whenever possible 'cause some intern probably
slaved away trying a million mutations of it until he/she got it down to a
minimum of cycles. 

View raw message