archiva-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marc Lustig ...@marclustig.com>
Subject Re: Proposal: concurrent remote-requests / "ASF Certified Maven Repository"
Date Thu, 15 Oct 2009 08:00:35 GMT

thanks Brett for the input.
I can confirm that using black and white lists the case is rather rare when
all remote-repos are searched sequentially and the artifact is not found in
the end. However it is typical for some scenarios e. g. when you enable the
source-jars to get downloaded for a project. From 40 deps, maybe 5 will have
source-jars available. In that way a simple mvn-goal takes 30 minutes or
more.

I mentioned the timeout just to have a maximum value.  Of course usually the
requests don't run in a timeout (except when the repo is down) - the average
response time is maybe 3-4 secs (for our installation).

Also it is clear that the first-serve concept conflicts with the existing
concept of an (ordered) list of repos that is searched for.
Can we not assume that artifacts with a given spec. are identical from
whatever repo they come, provided the hash is matching?

Btw., this brings up another idea: could the ASF possibly grant "official"
certificates for remote-repos?
In that way, Archiva could distinguish between trusted and non-trusted
repos.
For companies, this would be a compelling feature! I (working for insurances
and banks) often hear the argument "of boy - they are downloading software
from some obscure server from russia". Having the label "Certified Maven
Repository" would surely make those noises more silent :-)
The ASF could release a rule-set that the Maven-repo must conform to in
order to get the "certified" label.
Or even better, the ASF could offer a VMware-image that includes all the
software ready to run the Maven-repo - including some logic to verify that
known artifacts are mirrored correctly. A total control of repos is not
possible, of course. But the contract between Archiva and the remote repo
could be tightened pretty much.


Back to the concurrent requests idea: sending the HEAD request before the
actual GET is surely a good idea. Archiva could decide to which repo to send
the GET based on the shortest response-time.
Anyway, this feature needs more brainstorming...




brettporter wrote:
> 
> On 15/10/2009, at 12:06 AM, Marc Lustig wrote:
> 
>>
>> Hi all,
>>
>> we have configured about 25 remote-repos for our public-artifacts  
>> managed
>> repo.
>> In certain cases, black and white lists don't help and a request is  
>> proxied
>> to all the 20 remote-repos _sequentially_. Even thou we have  
>> configured a
>> short timeout of 5 secs, this takes 125 secs in case the artifacts  
>> doesn't
>> exist in any remote-repo - per artifact!
>>
>> So I was wondering if it would make sense to send requests to all of  
>> the
>> remote-repos _concurrently_.
>> The first thread that find the artifacts could cause all the other  
>> threads
>> to cancel the http-request.
>> The total request time would reduce from 100 secs++ to merely 5 secs.
>> Tremendous win or?
>>
>> Has this been discussed before?
> 
> I think this is a pretty unusual case. I don't quite understand why  
> you are hitting the timeout limit on the remote repo - if they are up  
> they should be fast. Also, "first that finds" is different to the  
> current rule since it's first that appears in the list. I worry that  
> in this set up you're not entirely sure which repository the artifacts  
> are meant to be coming from, so maybe it points to another problem.
> 
>> Is there an argument against this strategy?
> 
> Particularly if we turned on streaming of the proxied download to the  
> client (which is intended) - we couldn't do so if they were pooled  
> like this, unless we accepted the "first found rule".
> 
> That said, this might speed up requests with a long list of proxies,  
> even if they are functioning properly. So it might be reasonable as an  
> optional capability. One thing to consider would be doing a HEAD  
> request instead of a GET for all the remotes first to select where to  
> download from, then execute the GET from the desired one.
> 
> - Brett
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Proposal%3A-concurrent-remote-requests-tp25890731p25904406.html
Sent from the archiva-dev mailing list archive at Nabble.com.


Mime
View raw message