httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ralf S. Engelschall" <...@en.muc.de>
Subject Re: Shared memory for all servers?
Date Sun, 25 Aug 1996 09:44:29 GMT
On 25 Aug 1996 03:11:52 +0200 in en.lists.apache-new-httpd you wrote:

> Let me ask a real basic question --- how much time is saved by the
> lookup in this cache, instead of just redoing the translation each
> time?  If the matching only takes a fraction of a microsecond per
> request, a cache may not be worth the trouble.

The external lookup does
    - opening a mapfile
    - reading one line after another
    - applying a complete regexp to each line
    - parse out the key, value of the line
    - compare the key with the given one and if
      matching return the value
    - closing a mapfile
The looked up values are usually "directory locations" or "parts of URLs" for
Webclusters in Intranets which use homogeneous URLs. Here, _EVERY_ request
leads to a lookup through the external mapfile. In practice, it really needs
only one lookup, because most hyperlinks usually reference URLs in the same
subtree.

Example: /u/rse/ is the homedir of my, independend of the current server in
the Webcluster. To make this work every server of this cluster has to check
where the dir "rse" has its _physical_ location. To do so, it has to use a
external mapfile every time a URL gets requested with /u/rse prefix. You see:
The external lookup happends _ALWAYS_!

The problem is that the external maps are external ones and that their
parsing costs time due to I/O and regexp usage. While the cache-lookup is a
simple two-nested for-loop through a in-core cache structure.

So, I think the caching _IS_ useful.

> Also, what are the keys of the cache?  If they're full URLs, it might
> get big enough to swell the size of processes significantly; if it
> gets big enough to make the system swap, when it wouldn't otherwise,
> you might well be better off without it...

No, while it is actually possibible to write RewriteRules which lookup full
URIs in a database, this would not be useful. In practice there are always
parts of it. Because: One also has to maintain the mapfile ;-). In the
situation of the "homogeneous URL paths" these map files contain as much
entries as users are in the webserver cluster. At sd&m (the company I work
beside my study) we have approximately 300 users in these files. Far away
from making any problems like swapping etc.

But your question forces me to think about the paranoid webmasters who really
try to setup such "mega-maps". Ok, not manually, but perhaps via generation
by scripts. Perhaps I should add a max-entry-point for the cache, i.e.  a
number of maximum cacheable values.

> [ This is a bit of a knee-jerk, I know, but I really do break out in
>   hives when I see people making their code significantly hairier in
>   the name of performance, without numbers to show that the code they
>   are "improving" is in fact a significant performance hit.
>   Non-optimizations of this sort are, unfortunately, a proven way to
>   waste enormous amounts of programmer-time... ]

Your question is ok, really ;-) I have no problem with people who ask such
things. It is useful...

Greetings,
                                        Ralf S. Engelschall    
                                        rse@engelschall.com
                                        http://www.engelschall.com/~rse

Mime
View raw message