www-infrastructure-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Samuel Langlois (JIRA)" <j...@apache.org>
Subject [jira] [Created] (INFRA-3903) How to fetch ALL archives from Apache?
Date Thu, 01 Sep 2011 08:52:10 GMT
How to fetch ALL archives from Apache?
--------------------------------------

                 Key: INFRA-3903
                 URL: https://issues.apache.org/jira/browse/INFRA-3903
             Project: Infrastructure
          Issue Type: Task
      Security Level: public (Regular issues)
          Components: Mirrors
            Reporter: Samuel Langlois
            Priority: Minor


I work for Antelink, a French startup, and we are building Antepedia, a database of all the
open source of the world www.antepedia.com
I am requesting your kind help to complement it: we would like to add all the Apache archives,
which are on http://archive.apache.org/dist/
We fetched them from a local mirror, but they are only partially mirrored.
For instance this folder contains all releases of commons-httpclient, including 2.0 and 3.0
: http://archive.apache.org/dist/httpcomponents/commons-httpclient/
but this one, which is the mirror at OVH, only contains the latest 3.1 : ftp://mir1.ovh.net/ftp.apache.org/dist/httpcomponents/commons-httpclient/
We could not find a mirror that has everything.

I can propose the following solutions:
* we can use wget to crawl all archive.apache.org, but it will take ages and you will probably
ban us - for good reasons :-)
* you can open a rsync access - it would be easier for later update of new releases
* we can set up a rsync server on our side, and you can push to it regularly
* we can ship a big hard drive to someone (how big?)
* ... any other idea?
Which one would be easier for you?

Thank you very much for your help

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message