httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "William A. Rowe, Jr." <>
Subject Re: using proxy/cache for apache mirrors
Date Wed, 07 Dec 2005 07:18:32 GMT
Joshua Slive wrote:
> [ This really should be on infrastructure; oh well.]
> Perhaps I should have mentioned off the top that I envision setting 30+ 
> day expiry times on all .gz/.zip/.msi/.jar/etc files under dist/.  These 
> files should never change without being renamed.

Ok, it must be 24 hours.  Although they should not change, we should see
a HEAD/ifmodifiedsince query at least once per day per file requested to
**ENSURE** that if we strike a file that we find to be corrupt/viral/invalid,
it in fact goes poof from the mirrors in some reasonable amount of time.

In fact I'd set it initially to 1 hr, measure, back it down to 24 hrs and
see if we save any bandwidth/load from not testing it frequently.  Any
correctly configured machine should not burden us with anything more than
a "still valid?" ping.

Remember that the mod_autoindex results, themselves, can be invalidated
more frequently, which will let mirrors 'catch up' quicker than they do

> Colm MacCarthaigh wrote:
>>     * mod_cache + mod_proxy is trivially vulnerable to all of the latest
>>       DNS cache-poisoning trickery, with no easy fix. At the very
>>       least we should recommend that admins hard-code
>>       in their /etc/hosts file, and that INFRA get some PI-space and
>>       guarantee availability at a particular IP address for
>>       eternity. Or deploy DNSSEC, and insist that mirrors verify the
>>       records.
> I don't really see how the situation with mod_proxy is any worse than 
> the existing situation in that regard.  It could even be better given 
> that cache expiry times will far exceed rsync frequencies.

Do mirrors even validate any server signature for rsync?  If not this
argument is blowing smoke.  For that matter, we could even endorse the
use of ssl privately to our mirrors on the backend, with server cert
validation to avoid exactly what you describe above, as well as any
number of man in the middle attacks.  In fact, it seems this would be
much more robust than today's rsync, in terms of security.

>>     * We havn't fixed all of the thundering herd problems :/
> Again, with long expiry times, I don't see this as a problem.

Or fix it?

>>     * It's HTTP only. A lot of users use rsync and FTP to fetch
>>       content from a local mirror.
> I generally discourage ftp mirrors.  But yes, they would continue to 
> need to do rsync.

Why?  I'm not certain, but expect there are ways to play with wget to
fetch only new/changed files.  If not, perhaps it's time to teach wget
some new tricks :)

>>     * Next time gets compromised, the exposure
>>       will be two to four times as great compared to the rsync
>>       mirrors. CacheMaxExpire can fix this problem though.
> Again, long expiry times seem to make this problem less severe than with 
> rsync.

But the converse is an issue, see my first point.

> Just to explain the reasoning behind this a little: our dist/ directory 
> is rapidly approaching 10GB.  Although I don't have any statistics to 
> back this up, I strongly suspect that a very small portion of that 
> accounts for a very large portion of the downloads.  The rest gets 
> rsynced to our hundreds of mirrors for no good reason (other than 
> backups; but we don't need hundreds of backups).  In addition, our 
> projects are always clammering for faster releases -- they don't want to 
> delay their announcements to wait for mirrors to sync.  I know you have 
> "push" ideas for how to solve that, but the proxy technique works as well.
> (There are other ways to address these issues, of course.  We could stop 
> recruiting mirrors and limit ourselves to a dozen or so more reliable 
> mirrors.  But that would be a major change in thinking.)

Or ship more things to more quickly.

View raw message