www-repository mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark R. Diggory" <mdigg...@latte.harvard.edu>
Subject ASF Repository, closer.cgi and Depot
Date Wed, 14 Jul 2004 14:48:24 GMT
Sorry for the cross post but this seems relevant to both these groups.

I was thinking about the subject of mirroring and redirection for the 
ASF Repository. Currently, there was some discussion on the Depot list 
concerning this. I feel we could address this subject again for both 
groups interest.

www.apache.org/dyn/closer cgi provides a simple resolution strategy to 
attempt to determine the closest mirror available to the client browser. 
It then generates an html page via a template that lists the selected 
mirror as well as other available mirrors. With Depot, we have a 
customized download client that could be extended to manage downloading 
from a list of mirrors as well.

Here are my thoughts on this subject:

A.) This script is really not that big (90% of it is just parsing the 
mirrors file), and the database (a flat text file called mirrors.list) 
as well is not very big. While closer.cgi is a neat service for 
browsers. Its not exactly helpful for automated clients. Yet, 
mirrors.list is an excellent example of metadata that is exposed in a 
effective manner such that automated clients can access it.

http://www.apache.org/mirrors/mirrors.list

I'm somewhat convinced that a it would be simple to create a client 
implementation which accomplished the same functionality as closer.cgi 
programatically so that it could be used in terms of resolving a 
location to download from when mirrors are available.

This would be beneficial to the Apache Bandwidth issue in that if a 
client such as Depot/DownloadManager managed the same capability as 
closer.cgi then:

1.) to determine if the list file has been updated, all one needs to do 
is a head request on the file and review the lastModified date, 
downloading it if it is newer than the client local copy.

2.) Apache server cpu time is spent parsing this file for each 
"closer.cgi" request on the server side, instead the client spends the 
cpu time doing this calculation. After the intial head request to check 
when the mirror list was last updated, no other requests occur to 
www.apache.org in the download process.

B.) Downfalls?

1.) If such a service were server-side, we do get a centralized way of 
managing it.

But its difficult to control http client behavior from the server 
outside of the most simplistic of "http redirects", the cost of 
downloading a file becomes much greater in that each download request 
has to be redirected through closer.cgi.

2.) Statistics: I guess the benefit that I do see is that one could log 
requests through closer.cgi to track download statistics.

But these again would only be "partial stats" because any browser can 
simply bookmark a mirror and go to it directly. It seems more 
appropriate that a "download stats" tool would operate more behind the 
scenes of all the mirrors and be aggrigated across all the mirrors to 
gain more accuracy in such statistics.


Cheers,
-Mark


Mime
View raw message