httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Erenkrantz <jus...@erenkrantz.com>
Subject Re: re-do of proxy request body handling - ready for review
Date Wed, 02 Feb 2005 22:26:07 GMT
--On Wednesday, February 2, 2005 11:38 PM +0200 Graham Leggett 
<minfrin@sharp.fm> wrote:

> If mod_cache was taught to serve a being-cached URL directly from the
> cache (shadowing the real download), there would be no need for parallel
> connections to the backend server while the file is being cached, and no
> load spike.

I don't see any way to implement that cleanly and without lots of undue 
complexity.  Many dragons lay in that direction.

How do we know when another worker has already started to fetch a page?
How do we even know if the response is even cacheable at all?
How do we know when the content is completed?

For example, if the response is chunked, there is no way to know what the 
final length is ahead of time.

If we're still waiting for the initial response (i.e. request has already 
been issued but no data received back yet), then we don't know if the 
origin server will tack on a Cache-Control: no-store or Vary or there is 
some other server-driven reason that it won't be cached or acceptable to 
this client.

Additionally, with this strategy, if the first client to request a page is 
on a slow link, then other clients who are on faster links will be stalled 
while the cached content is stored and then served.

The downside of stalling in the hope that we'll be able to actually serve 
from our cache because another process has made the same request seems much 
worse to me than our current approach.  We could end up making the client 
wait an indefinite amount of time for little advantage.

The downside of the current approach is that we introduce no performance 
penalty to the users at the expense of additional bandwidth towards the 
origin server: we essentially act as if there was no cache present at all.

I'm also unsure that this strategy would mesh well with mod_disk_cache.  I 
think an entirely new and different provider would have to be written 
(assuming we could surmount the above challenges, which I believe are much 
harder than they look).  mod_disk_cache deliberately doesn't use shared 
memory because it introduces unnecessary complexity to the code. 
mod_disk_cache also delays any indication that it has started to fetch the 
page until content has been received.  In fact, the way mod_disk_cache 
works right now is we have an acceptable race condition in that the last 
one to finish will store the data and overwrite all the instances that came 
before.

I would rather focus on getting mod_cache reliable than rewriting it all 
over again to minimize a relatively rare issue.  If it's that much of a 
problem, many pre-caching/priming strategies are also available.  -- justin

Mime
View raw message