httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dean Gaudet <dgau...@arctic.org>
Subject RE: mod_include heuristic
Date Tue, 22 Jul 1997 18:17:30 GMT
I had another idea on how to do this in a more general way.

Attached a "fake" client to r->connection->client, essentially attach a
big buffer.  Now it's easy to snarf the output of subrequests.  But you
still need to find the headers from subrequests before they're tossed
away.  An API phase called pre_handler that runs during run_sub_req()
would be the best.  This way you're not doing extra work on requests that
are being tossed away, you only do the work on requests that are actually
going to be run. 

Now, ok, so you've got your hooks into the subrequests, and you can steal
output.  What do you do with headers?

- the subrequest must have a Last-Modified, if it doesn't then abort the
    entire first pass, otherwise take the max of last-modified and whatever
    you've seen so far
- if the subrequest does not have an ETag then abort the entire first pass
    otherwise append the ETag to the current ETag.
- if the subrequest has an Expires, take the min of any current expires and
    the subrequest Expires

Or something like that.

recursive requests become challenging.  Needs more thought.

BTW the same gear could be used to do the mod_cgi, Content-Length
generation.

This solution feels 2.0ish.

Dean

On Tue, 22 Jul 1997, Lars Eilebrecht wrote:

> According to Dean Gaudet:
> 
> > Ok, here is something that I think I would be happy with:  use boyer-moore
> > and mmap() in mod_include to speed it up.  Then use a quick and dirty
> > two-pass heuristic to calculate Last-Modified (and ETag)
> 
> and ideally "Content-MD5" if ContentDigest is enabled.
> 
> > provided that all of the directives are truly static.  The first pass aborts
> > as soon as it encounters something non-static. 
> 
> If there's an <!--#include virtual="foobar.sh" --> somewhere in the document
> you may need to check if it's really static (eg. included as-is) or if it is
> maybe a CGI script.
> This was a problem I didn't solve (was to lazy to solve ;-) when I hacked my
> old NCSA server to output Last-Modified headers with SSIs. I'm now using a
> derived version (written by a friend) on my Apache servers (the infamous
> SSILMHACK, there is a PR for it I think)
>  
> > The end result will probably be about the same performance as the existing
> > mod_include.
> 
> Sounds great.
> 
> [...]
> > The directive "IncludesTwoPassThresh NNN" would indicate that two-pass
> > should be aborted whenever NNN bytes have been read from the inputs...
> > which lets it be disable, and prevents it from being a problem on large
> > inputs.
> 
> Hmmm... do we really need this? Imagine the following: if a resource
> (a big one) is rarely access it doesn't hurt Apache if he has to
> parse it twice. If the resource is frequently accessed it maybe often
> cached in a proxy-cache resulting in less hits on the server.
> But if the pass is aborted due to "IncludesTwoPassThresh" it cannot be
> cached. This may results in a higher load on the server (depending on
> how big NNN is).
> Maybe a per-directory directive that completely disables the two-pass
> variant is more useful (eg. "DisableTwoPassIncludes").
> 
> Just some esoteric thoughs... :-)
> 
> ciao... 
> -- 
> Lars Eilebrecht
> sfx@unix-ag.org
> 


Mime
View raw message