httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergio Leonardi" <sleona...@etnoteam.it>
Subject RE: [PATCH] mod_disk_cache deterministic tempfiles
Date Fri, 19 Aug 2005 10:39:21 GMT
Hi all
I'm thinking to a possible solution to these problems, let me know if this
makes sense. Please note I may suggest things that are already implemented
because I had no time to extensively study the module yet.

Cache module can be splitted in two parts to implement a producer-consumer
model:
1 - mod_cache: should be the "front end" module, the one which accepts
requests and serve contents from the cache
2 - mod_cache_requester: should be the "back end" module, the one which
accepts requests from mod_cache whenever he needs to regenerate the pages;
it will make use of regeneration timeouts that will be discussed further

Up to this point there is no news, let me go deeper.

Whenever mod_cache needs a page generation it contacts mod_cache_requester
inserting data needed into a 2-levels priority queue:
level 1: this is reserved for requests generated by browsers
level 2: this is reserved for requests generated by internal cache
regeneration (i.e. because expiration time is approaching)

In the data structure used we can put all incoming requests for the same URL
in the same "request block" (in a way simil to file system drivers) just to
keep track of how many processes/threads requested it in order to contact
them whenever content generation has been completed.

For each request block:
1 - mark this request block as "generating" instead of "waiting for
generation"
2 - take the first highly prioritized request for this request block
3 - use a configurable default overall generation timeout and a default
block-by-block generation timeout (this can lead to a longer timeout if
content data bytes is continuing to be served by the back end); this latter
is useful for "heavy" content
4 - perform the request to the back end using the above default timeouts
5 - back end response header can contain some "special" variable that can be
used to update above timeout values before serving content data (in order to
allow specific pages to have different timeout values, up to back end
programmer/administrator); if it contains, set these values and remove
special variables from the response header
6 - put the content in the cache
7 - contact each thread/process of mod_cache telling that content is ready
to be served

If timeout occurs between steps 4 - 6:
1 - contact the only thread/process that generated current request of this
request block (the one selected in step 2) giving up
2 - remove this request from this request block 
3 - if request block is empty remove it from the queue and exit
4 - if request block is not empty mark the request block as "waiting for
generation", giving to other requests of the same block a chance to be
generated


NOTE 1: Obviously by default level 1 is processed "before" level 2 (or they
can use 2 separate connection pools to the back end, but the first one
should be greater).

NOTE 2: If a request is generated at level 2 and is not served before
content expires, it can happen that a browser asks for the content and
generates a level 1 request for the content. This can be managed by
"escalating" the original request from level 2 to level 1 (using browser's
request data instead of the ones used by internal regeneration). 

What do you think about it?
It could be a way to limit requests to the back end in terms of:
- number of requests: avoiding back end saturation during load peaks
- duplicate requests: avoiding back end saturation during regeneration time

Please forget it if I told something OT.
Bye

	Sergio

-----Original Message-----
From: Colm MacCarthaigh [mailto:colm@stdlib.net] 
Sent: giovedì 18 agosto 2005 20.10
To: dev@httpd.apache.org
Subject: Re: [PATCH] mod_disk_cache deterministic tempfiles

On Thu, Aug 18, 2005 at 02:00:52PM -0400, Brian Akins wrote:
> Colm MacCarthaigh wrote:
> 
> >So mtime not being recent is no-indication of death, it could easily be
> >a trickling download.
> 
> True. But, if the files mtime has not changed in 120 seconds (for 
> example) the download is probably hung?

120 second stalls arn't uncommon, and there are plenty of overloaded
servers and terrible CGI's that have those kind of response times. A
major use of mod_cache to solve just that problem, but any approach
- no matter what number of seconds you pick - will always introduce
inefficiency.

If you pick a value that is too low, you'll never cache slow-to-serve
content. If you pick a value that is too high, you'll end up sending a
lot of requests to the (already slow) backend.

There might be a solution in using the scoreboard.

-- 
Colm MacCárthaigh                        Public Key: colm+pgp@stdlib.net


Mime
View raw message