httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Graham Leggett <minf...@sharp.fm>
Subject Re: re-do of proxy request body handling - ready for review
Date Thu, 03 Feb 2005 07:06:04 GMT
Justin Erenkrantz wrote:

> I don't see any way to implement that cleanly and without lots of undue 
> complexity.  Many dragons lay in that direction.

When I put together the initial framework of mod_cache, solving this 
problem was one of my goals.

> How do we know when another worker has already started to fetch a page?

Because there is an (incomplete) entry in the cache.

> How do we even know if the response is even cacheable at all?

RFC2616.

> How do we know when the content is completed?

Because of a flag in the cache entry telling us.

> For example, if the response is chunked, there is no way to know what 
> the final length is ahead of time.

We have no need to know. The "in progress" cache flag is only going to 
be marked as "complete" when the request is complete. If that request 
was chunked, streamed, whatever makes no difference.

> If we're still waiting for the initial response (i.e. request has 
> already been issued but no data received back yet), then we don't know 
> if the origin server will tack on a Cache-Control: no-store or Vary or 
> there is some other server-driven reason that it won't be cached or 
> acceptable to this client.

As the cache was designed to cache multiple variants of the same URL, 
Vary should not be a problem. If we are still waiting for the initial 
response, then we have no cache object yet - the race condition is still 
there, but a few orders of magnitude shorter in duration.

> Additionally, with this strategy, if the first client to request a page 
> is on a slow link, then other clients who are on faster links will be 
> stalled while the cached content is stored and then served.

If this is happening now then it's a design flaw in mod_cache.

Cache should fill as fast as the sender will go, and the client should 
be able to read as slow as it likes.

This is important to ensure backend servers are not left hanging around 
waiting for slow frontend clients.

> The downside of stalling in the hope that we'll be able to actually 
> serve from our cache because another process has made the same request 
> seems much worse to me than our current approach.  We could end up 
> making the client wait an indefinite amount of time for little advantage.

There have been bugs outstanding in mod_proxy v1.3 complaining about 
this issue - the advantage to fixing this is real.

> The downside of the current approach is that we introduce no performance 
> penalty to the users at the expense of additional bandwidth towards the 
> origin server: we essentially act as if there was no cache present at all.

But we introduce a performance penalty to the backend server, which must 
  now handle load spikes from clients. This problem can (and has been 
reported in the past to) have a significant impact on big sites.

> I would rather focus on getting mod_cache reliable than rewriting it all 
> over again to minimize a relatively rare issue.  If it's that much of a 
> problem, many pre-caching/priming strategies are also available.  -- justin

Nobody is expecting a rewrite of the cache, and this issue is definitely 
not rare. I'll start looking at this when I finished getting the LDAP 
stuff done.

Regards,
Graham
--

Mime
View raw message