httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Stein <gst...@lyra.org>
Subject Re: Apache caching public-proxy layer...
Date Tue, 10 Oct 2000 07:05:50 GMT
On Mon, Oct 09, 2000 at 08:32:03PM -0700, rbb@covalent.net wrote:
> On Mon, 9 Oct 2000, Greg Stein wrote:
> > On Mon, Oct 09, 2000 at 10:56:54AM -0700, rbb@covalent.net wrote:
> > >...
> > > It should also remove all other filters from the request_rec.  Take the
> > > example of a page that was gzipp'ed on the way out the first time.  We
> > > cache it, and the next time the data is served, we just want to send it,
> > > it should not be sent through the gzip filter again.
> > 
> > Euh... We shouldn't be encouraging the removal of filters like that. How can
> > the caching system know which to remove, and which to leave? It can't. Who
> > knows what those filters may be doing.
> 
> The caching module would need to inspect the content filters that were run
> for the previous request.  Those filters need to be removed.  We have no
> idea how filters will re-act if the same data is passed through them
> twice.

How can it "inspect the content filters" ?? It isn't like it can know
everything about every filter.

I don't know how the caching filter is going to work, but it will have to do
it from the standpoint that filters are block boxes.

It seems that some higher-level mechanism will need to manually rebuild a
filter chain based on whether content is cached or not. Consider:

    GEN -> F1 -> F2 -> F3 -> C1 ...

If you cache the value between F1/F2, then the next request can/should skip
GEN and F1 and inject a value directly into F2. Similarly, if the cached
value occurs after F3, then GEN/F1/F2/F3 get skipped and the cache injects
its value directly into C1.

No clue how this happens. A lot of it is based on how the chain is produced
in the first place. Ideally, the cache would simply state "here is my
output" and the system figures out that it doesn't need any additional
filtering. If the cache says "here it is" and the system says "wait. that
must still be translated and gzippd", then F2 and F3 get added to the chain,
and away you go.

Hmm. It would appear that the mechanism to *capture* the value is a filter.
*Fetching* the cached value is a content generator, however. The "type" of
its output would be the "type" that occurred as the output within the filter
chain where the capture took place.

Note that you could capture multiple copies. At G/F1, F1/F2, F2/F3, and
F3/C1. Depending on the client needs, you could select different captured
content for replay to the client. For example, assume that the first client
(causing the capture) wanted gzip'd output, so F3 represents the gzip
filter. You get a gzip'd version at the F3/C1 capture point, and a
non-gzip'd version at F2/F3. When Client B comes along and asks for a
non-gzip'd value, he gets the thing captured at F2/F3. Client C asks for a
gzip'd version and gets F3/C1. Client D wants it in a UTF8 encoding (rather
than Latin-1) and so he gets F1/F2; however, he also wants it gzip'd, so the
system adds a gzip filter into the output filter chain:

    GEN{cache system} -> F{gzip} -> C1 ...

Client B and C just have:

    GEN{cache system} -> C1 ...
    
Client A had the sequence further up, with capture points at each filter
transition.


What does all this mean? Well, for starters, it probably means that filter
insertion is probably much more tied to the Content-Type/Encoding of the
output than we are set up for right now. We had always acknowledge that, but
have been leaving it for another day.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Mime
View raw message