httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Erenkrantz <jus...@erenkrantz.com>
Subject Re: RFC: can I make mod_cache per-dir?
Date Wed, 17 Aug 2005 01:13:38 GMT
On Mon, Aug 15, 2005 at 09:55:34AM +0100, Colm MacCarthaigh wrote:
> mod_cache configurability sucks big-time. CacheEnable adds yet another
> location mapping scheme for administrators to deal with, but this scheme
> lacks basic flexibility;
> 
> 	It can't reliably disable caching for a directory. 

As I mentioned in my previous email, we know nothing about directories at the
time that the cache is run.  The problem is identical for any external caches
that have no knowledge of the file paths that the origin server has.

> 	It's about 99.9% useless for a forward proxy configuration. ;-)

I'm not sure why you think that.  Can you expand upon this?

> 	It can't do regex matching, unlike every other part of Apache.

That is fairly trivial to resolve.  However, regexes are extremely expensive.

> 	It involves some fairly pants linear searches through the url lists, 
> 	which means not a hope of implementing complex configurations while 
> 	keeping the performance mod_cache is supposed to be for :-/

Hmm?

> I'm guessing that the majority of CacheEnable instances out there in the
> world probably take "/" as their url argument. For this case, the
> changes I've made speed things up. For other cases there is some small
> potential slowdown, for example if you had only;
> 
> 	CacheEnable disk /wiki/
> 
> Previously mod_cache would have done a url match at the handle stage and
> if it didn't match, that would have been that. With this patch, it
> instead looks up the url with the caching provider directly. This has
> two consequences; 

I don't understand what you mean.

> 	1. It means all requests are hit with the cost of a lookup 
> 	   in the cache provider, but this shouldn't be expensive.
> 	   It's already what most sites are doing. And even with
> 	   mod_disk_cache it's relatively painless, just a hashcalc
> 	   and an attempt at open(). 
> 
> 	   Either way, the url match functionality at this stage can 
> 	   be added back trivially, but I decided not to in my patch
> 	   because it's so confusing to have.

Hmm?

> 	2. If an admin re-configures with caching enabled for less
> 	   locations that they had previously, they have to know to 
> 	   either clear the cache or to know that the entities will 
> 	   still get served from the cache until they have expired. 
> 	   The patch includes a new Caching user guide, for this and 
> 	   other reasons.

Why would the system trigger a match then?  The configuration should block the
requests from being processed.

> As I was saying; What I've done gets rid of the CacheEnable and
> CacheDisable directives, and instead lets you do this;
> 
> 	# Cache everything to memory, or then disk
> 	CacheContent mem disk
> 
> 	# Cache content for /foo/ to disk only
> 	<Location /foo/>
> 	    CacheContent disk
> 	</Location>
> 
> 	# Don't cache these files at all
>         <LocationMatch ~/foo/*.txt$>
> 	    CacheContent disk off
> 	</LocationMatch>

Should that be 'off' or 'none' or something else?  What's disk doing here?

> 	<Proxy *>
> 	   # Only cache to disk
> 	   CacheConent disk
> 	</Proxy>
> 
> 	<Proxy http://securityupdates/dist/>
> 	    # Don't cache the list of security updates, ever
> 	    CacheContent off
> 	<Proxy>
> 
> 	<VirtualHost foobar>
> 	    # This vhost should never be cached
> 	    CacheContent off
> 	</VirtualHost>	

I do think this is likely a more intuitive way to configure it if we can do it
without impacting overall performance.

> But I'm still not finished, and I'd like some advice on what next. The
> per-dir information isn't availabe at the quick-handle stage, so the
> mod_cache handle has to rely on per-server config to decide which
> providers to try and use for serving content. (right now I've simply
> hard-coded mem and disk).

Why must they be hard coded?

> There are two options for doing this;
> 
> 	1. Register any providers used by CacheContent at the config
> 	   stage in the per-server conf. Has the advantage of reducing
> 	   the ammount of directives involved and minimisming admin
> 	   confusion. Disadvantages; Makes using CacheContent in 
> 	   htaccess files a bit iffy, there would have to be a 
> 	   CacheContent directive in the base config files first.
> 	   Making the order providers are tried in would also be a
> 	   bit of a pain.

Why should these be in htaccess?  There should not be any directory walking.

> 	2. Adding a another directive. "CacheEnable" makes the most
> 	   sense as a name, but it would also be a change in its
> 	   behaviour. So "CacheServe" as a name might be an option.
> 	   This would be a per-server directive, that says;
> 
> 		"CacheEnable mem disk"
> 	
> 	   Which would mean serve from memory, or then disk (ie in
> 	   that order) for this server. 
> 
> I vastly prefer 2. myself, but I'd like to know what hope (if any) have
> I of getting major changes to directives and the basic configuration of
> a module committed? And also, people's thoughts on the trade-off of not
> performing a url comparison at the handle stage.

We can't control the handlers.  And, by and large, most of the overhead in
processing the request is already incurred by the time we hit the handlers.
(Anything that touches regexs like BrowserMatch are ridiculously expensive.)

But, improving configuration is a worthy goal.  -- justin

Mime
View raw message