Return-Path: Delivered-To: apmail-httpd-dev-archive@www.apache.org Received: (qmail 50579 invoked from network); 7 Dec 2005 01:16:39 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 7 Dec 2005 01:16:39 -0000 Received: (qmail 37759 invoked by uid 500); 7 Dec 2005 01:16:34 -0000 Delivered-To: apmail-httpd-dev-archive@httpd.apache.org Received: (qmail 37702 invoked by uid 500); 7 Dec 2005 01:16:34 -0000 Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm Precedence: bulk Reply-To: dev@httpd.apache.org list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@httpd.apache.org Received: (qmail 37691 invoked by uid 99); 7 Dec 2005 01:16:33 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Dec 2005 17:16:33 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [66.111.4.26] (HELO out2.smtp.messagingengine.com) (66.111.4.26) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Dec 2005 17:16:33 -0800 Received: from frontend1.internal (mysql-sessions.internal [10.202.2.149]) by frontend1.messagingengine.com (Postfix) with ESMTP id B42A8D21344 for ; Tue, 6 Dec 2005 20:16:10 -0500 (EST) Received: from frontend2.messagingengine.com ([10.202.2.151]) by frontend1.internal (MEProxy); Tue, 06 Dec 2005 20:16:10 -0500 X-Sasl-enc: Kkhlw8EgeWoHfpDWu9KcB+S2KmfzaLiR6MHpzMcndGUZscJ65g 1133918169 Received: from [192.168.2.12] (Toronto-HSE-ppp3741650.sympatico.ca [67.68.70.13]) by frontend2.messagingengine.com (Postfix) with ESMTP id 35034571431 for ; Tue, 6 Dec 2005 20:16:08 -0500 (EST) Message-ID: <439637D7.40103@slive.ca> Date: Tue, 06 Dec 2005 20:16:07 -0500 From: Joshua Slive User-Agent: Mozilla Thunderbird 1.0.2 (Windows/20050317) X-Accept-Language: en-us, en MIME-Version: 1.0 To: dev@httpd.apache.org Subject: Re: using proxy/cache for apache mirrors References: <4395FFA1.4040209@slive.ca> <20051206220542.GA8467@dochas.stdlib.net> In-Reply-To: <20051206220542.GA8467@dochas.stdlib.net> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [ This really should be on infrastructure; oh well.] Perhaps I should have mentioned off the top that I envision setting 30+ day expiry times on all .gz/.zip/.msi/.jar/etc files under dist/. These files should never change without being renamed. Colm MacCarthaigh wrote: > * It's vastly more complicated than neccessary and adds a burden > to what admins have to manage. Why should they have to worry > about managing a cache? They're busy enough trying to give us > free resources in the first place. Either you manage the cache or you manage rsync. I don't really see why one is easier than the other. > * It adds massive dependencies to what a mirror server needs to run. > Adding modules, especially proxy, is not resource-free. These > things eat memory, research time and security work. We're not going to get rid of rsync mirroring, so mirrors that don't want to use this don't need to. > > * It defeats a huge part of the point of having a mirroring > system in the first place. Mirroring isn't just a way of > decreasing bandwidth usage on the primary, it's also a means > of building content resilience. When www.apache.org goes down, users > want their mirror to work. And worst of all, in the case of > infrequently used mirrors, this is exactly when they'll > suddenly get a lot of queries - all of which will end up in IOWAIT > land, with a boat-load of back-end TCP connections, and no > content served. That really sucks, for both them and their > users. Certainly for mirrors that see themselves as providing large-scale backups, this would not be a good technique. From the apache.org point of view, people have no way of even finding our recommended mirrors if we are down, so it doesn't really help. And for frequently requested files, the long expiry time will allow the mirrors to continue to serve them. > * mod_cache + mod_proxy is trivially vulnerable to all of the latest > DNS cache-poisoning trickery, with no easy fix. At the very > least we should recommend that admins hard-code www.apache.org > in their /etc/hosts file, and that INFRA get some PI-space and > guarantee availability at a particular IP address for > eternity. Or deploy DNSSEC, and insist that mirrors verify the > records. I don't really see how the situation with mod_proxy is any worse than the existing situation in that regard. It could even be better given that cache expiry times will far exceed rsync frequencies. > * We havn't fixed all of the thundering herd problems :/ Again, with long expiry times, I don't see this as a problem. > > * It's HTTP only. A lot of users use rsync and FTP to fetch > content from a local mirror. I generally discourage ftp mirrors. But yes, they would continue to need to do rsync. > * Next time www.apache.org gets compromised, the exposure > will be two to four times as great compared to the rsync > mirrors. CacheMaxExpire can fix this problem though. Again, long expiry times seem to make this problem less severe than with rsync. Just to explain the reasoning behind this a little: our dist/ directory is rapidly approaching 10GB. Although I don't have any statistics to back this up, I strongly suspect that a very small portion of that accounts for a very large portion of the downloads. The rest gets rsynced to our hundreds of mirrors for no good reason (other than backups; but we don't need hundreds of backups). In addition, our projects are always clammering for faster releases -- they don't want to delay their announcements to wait for mirrors to sync. I know you have "push" ideas for how to solve that, but the proxy technique works as well. (There are other ways to address these issues, of course. We could stop recruiting mirrors and limit ourselves to a dozen or so more reliable mirrors. But that would be a major change in thinking.) Joshua.