httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colm MacCarthaigh <c...@stdlib.net>
Subject Re: using proxy/cache for apache mirrors
Date Wed, 07 Dec 2005 01:53:00 GMT
On Tue, Dec 06, 2005 at 08:16:07PM -0500, Joshua Slive wrote:
> Perhaps I should have mentioned off the top that I envision setting 30+ 
> day expiry times on all .gz/.zip/.msi/.jar/etc files under dist/.  These 
> files should never change without being renamed.

This is a double-edged sword, see below ...

> Colm MacCarthaigh wrote:
> >	* It's vastly more complicated than neccessary and adds a burden
> >	  to what admins have to manage. Why should they have to worry
> >	  about managing a cache? They're busy enough trying to give us
> >	  free resources in the first place.
> 
> Either you manage the cache or you manage rsync.  I don't really see why 
> one is easier than the other.

100s of projects use rsync, there's 1 that recommends a cache. For the
most part  the either-or is a fallacy, for most mirror operators you add
to their workload :/ Doesn't affect Apache-only mirrors though, but not
sure how many of those there are.

> want to use this don't need to.

Sure, but why recommend one method over another so strongly?

> Certainly for mirrors that see themselves as providing large-scale 
> backups, this would not be a good technique.  From the apache.org point 
> of view, people have no way of even finding our recommended mirrors if 
> we are down, so it doesn't really help.  And for frequently requested 
> files, the long expiry time will allow the mirrors to continue to serve 
> them.

The default max expiry respected by mod_cache is 1 day, and this
really needs to be lowered in this case :)

> >	* mod_cache + mod_proxy is trivially vulnerable to all of the latest
> >	  DNS cache-poisoning trickery, with no easy fix. At the very
> >	  least we should recommend that admins hard-code www.apache.org
> >	  in their /etc/hosts file, and that INFRA get some PI-space and
> >	  guarantee availability at a particular IP address for
> >	  eternity. Or deploy DNSSEC, and insist that mirrors verify the
> >	  records.
> 
> I don't really see how the situation with mod_proxy is any worse than 
> the existing situation in that regard.  It could even be better given 
> that cache expiry times will far exceed rsync frequencies.

That's the problem!  ...

> >	* Next time www.apache.org gets compromised, the exposure
> >	  will be two to four times as great compared to the rsync
> >	  mirrors. CacheMaxExpire can fix this problem though.
> 
> Again, long expiry times seem to make this problem less severe than with 
> rsync.

They make the problem worse. Compromised binaries hang around for 30
days if you do that. And we'd have to track them all down. And we don't
have any useful logs of many of the mirrors, they look just like regular
HTTP requests. This is why it's much more dangerous compared to rsync.

If we ask operators to increase the CacheMaxExpire to 30 days, that
means that my one-time cache-poisoning now gets me a dodgy binary on
their mirror for a full month. With rsync, only a few hours - if at all,
as it's *considerably* harder to set up a full rsync repository that
will get through most mirroring systems. 

We need to make the mirrors much more aware of the risk involved.

> Just to explain the reasoning behind this a little: our dist/ directory 
> is rapidly approaching 10GB.  Although I don't have any statistics to 
> back this up, I strongly suspect that a very small portion of that 
> accounts for a very large portion of the downloads.  

A tiny proportion :) Out of that 10GB, we see about 150MB being pulled
daily. 10GB is fairly small, in the context of project archives.  You'd
be surprised just how many projects are at least 10 times that number!

> The rest gets rsynced to our hundreds of mirrors for no good reason
> (other than backups; but we don't need hundreds of backups). In
> addition, our projects are always clammering for faster releases --
> they don't want to delay their announcements to wait for mirrors to
> sync.  I know you have "push" ideas for how to solve that, but the
> proxy technique works as well.

We'd have to go all-proxy for that, don't really have an answer
though.

> (There are other ways to address these issues, of course.  We could
> stop recruiting mirrors and limit ourselves to a dozen or so more
> reliable mirrors.  But that would be a major change in thinking.)

It would, and less community involvement too :/ 

-- 
Colm MacCárthaigh                        Public Key: colm+pgp@stdlib.net

Mime
View raw message