www-repository mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject Re: [Plan of action] Setting up an official maven repository for the ASF
Date Thu, 08 Jun 2006 20:43:38 GMT
Niclas Hedhman wrote:
> On Thursday 08 June 2006 08:07, Alex Karasulu wrote:
> 
>> Niclas Hedhman had some very interesting ideas on using RDF with a build
>> system which did not centralize a repository there by preventing many of
>> the bottle neck issues we have today with central repos like ibiblio.
>>
>> Perhaps Niclas can elaborate a bit on this ingenious idea.
> 
> The principle was about applying RDF meta on top of artifacts, including 
> authentication and other meta information, available from RDF search engines 
> in the distributed fashion RDF is designed to work. Design points were;
> 
>  * No centralized repository of everything.
>  * Distributed search engines.
>  * Working with existing, published artifacts without participation of the
>    publishers, such as sourceforge.net projects.
>  * Ensured Authenticity.
> 
> Sure Stefano can provide the details in how to best go about doing this.
> 
> Unfortunately, before getting to this stage, I hanged in the towel due to lack 
> of time. 
> 
>> BTW the p2p propagation of artifacts via bittorent is a great idea.
> 
> Yep. I like that idea a lot as well. Having the users participate in the P2P 
> network is not very hard. Maven doesn't need to be involved, and a pure 
> voluntary effort of starting a separate daemon would probably give enough 
> peers to manage. After all, we are not talking multi GB downloads (yet) ;o)
> However, How well does P2P perform when the files are small? Would the 
> establishment of connection take too long, and we end up with very slow 
> downloads?

RDF is a graph data model with a weird XML serialization and there are
no standards or tools that would help us in achieving the above goals,
so I think it is wrong to mention what data model to use, especially
since maven is not RDF aware and honestly wouldn't gain much if it was.

                            - o -

The latest trend in term of content distribution is "magnet" URI scheme
which a very simple yet radically clever new way of using the web.

 http://en.wikipedia.org/wiki/Magnet:_URI_scheme

Instead of a URL such as the above, you get a magnet URI such as

 magnet:?xt=urn:sha1:YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C

and this is basically a massive hash into a globally interconnected
hashtable (google up DHT, if you want to know more).

Note how:

 1) there is no protocol description there.. the client will connect to
all the networks it knows and try to find a reference for that hash. You
can find one, zero or more than one. The 160 bits of address space
guarantee some low probability of collision. Zero means that your client
might not know how to access the network that content is located on.

 2) there is no need for a DNS anymore! in an HTTP URI which is used as
a URL, DNS provides the first host->IP translation and the HTTP server
provides the path -> bitstream translation. Here the two are merged into
one... which means that you can publish content even without having a
registered domain! as long as you know how to hook up your content to an
existing transport network and hand off the magnet URI to somebody.

Modern bittorrent clients don't need trackers anymore. (trackers are the
equivalent of DNSs for bittorrent, allowing you to ask for what IP
addresses currently contain that file you are looking for). Azureus, for
example, can restore your torrent even if the tracker is no longer
functioning, using a distributed hashtable to store the SHA1->IP
information.

So, imagine that your maven was an azureus DHT client and that your POM
contains a bunch of magnet URIs for the dependency jars... then what
happens is:

 1) your maven asks the azureus DHT about IP addresses that have that
SHA1 hash. [note how there is no central point of failure/control in
this system and it's already running with an average of 800k nodes alive
each day and the code is written in java and open source]

 2) then the bittorrent transfer is initiated between you and all the IP
addresses that have those files and are seeding them or currently
downloading them.

Since maven jars are very small, the swarm wouldn't really help that
much: your maven client would download a jar before the swarm could even
percolate the information that you are downloading it.

But your maven client might be left 'seeding' the jars that you have
downloaded... basically making that every user that wants an
instantaneous, transparent and bandwidth-shaped mirror of the jars that
you use. Nice social way to pay back.

The two missing pieces are:

 1) trust: each pom/jar should be digitally sign and the signature
should allow the client to infer the dependency of signatures. Trust
implies that you stop at a signature that you trust. These parameters
should, of course, be user configurable.

 2) search: the azureus DHT is not searcheable because it only contains
a hash->ip information and you cannot reconstruct the hash of the POM
from the hash of the jar. Hashes are, by design, opaque, so it's
impossible to scan the azureus DHT for maven packages, get their poms
and provide search on it.

It is fair to note that there is nothing in the bittorrent architecture
that implies trust or search. This is a design decision, and IMO, a good
one.

-- 
Stefano.


Mime
View raw message