www-repository mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Anderson" <...@netspace.net.au>
Subject RE: URI/URL Syntax -- little nits to be aware of
Date Sun, 09 Nov 2003 23:32:13 GMT
> From: Adam R. B. Jack [mailto:ajack@trysybase.com]
> I know URI syntax is dragging on (and I don't know if we are coming to
> consensus or going round and round) but I hope folks are still
> open eared to
> this stuff, because IMHO the URI /URL syntax may be *the only critical
> thing* we need to determine/document for repository to be at a
> satisfactory
> phase 1.
> As I believe Roy wrote -- we must include computer parsability into the
> specification. I feel the URI and the resource/file names need to
> be machine
> parsable so the directories/HTML are metadata in themselves for
> simple/smart
> tools.


> 2) Version in the filename has it's issues also -- e.g.
> "jakarta-servlet-api-4-1.1" -- is that version 4 or 1? (It is 1.1 of
> jakarta-servlet-api-4.)
> 3) Some folks like to use _ not - for such separators. Some also
> like to use
> periods in resource names. Both make resource parsing hard.

Having the version in the directory path helps here.

> If we wish to parse we either need some convention or separator -- or we
> need to better define the version namespace. Also, whether
> version is in the
> filename or the directory, how does one 'understand' the version? Is
> 1.1-SNAPSHOT, "better" than 1.1, "better" than "-alpha"? If we want to
> process versions we certainly need some sort of specification. [Note:
> metadata in each group could define the version specification/standard,
> etc.]

I believe thats outside the scope.

> BTW: With code specifically trying to "sniff out the right stuff"
> Ruper2 is
> currently able to process all but 35 of the couple of thousand of
> artefacts
> on Maven's Ibiblio repository. Those 35 have resource name formats that
> break parsing. Maybe we do an 80/20 rule, but it seems a real shame not to
> have 100%.
> BTW: The same parsing issues arise for anything at the end of the
> filename,
> e.g. -src or -docs. How does one know those aren't some version attribute
> (like -snapshot or -beta).

Again, having the version in the directory path helps here.

> I don't know what folks views are, but I could see we have to break every
> part of the URI down and define/document "best practices" or "standard" in
> order to ensure the URIs were parsable.  A such, I believe we
> ought document
> a URI and URL specification (on Wiki would be my preference, but if nobody
> else volunteers to be secretary, I'd take that one.) I do feel
> strongly that
> the syntax must be completely computer readable w/o additional
> metadata (at
> least most of the time.)

+1. But bear in mind that these "best practices" will be language specific.


> Also, to provide benefit to the users we probably need
> abstractions such as
> "latest", or groupings such as "all artefacts". Do we work those
> into a URI?
> Into a URL?
> Making a user come get the "jars" and then come get the "src" or "xml
> resources" (if there are such things) seems rude. A user ought be able to
> say type="all", and get all of them. My experience has been that grouping
> those in one directory is probably easiest for the clients (since
> they don't
> need metadata to make associations.)

Not necessary I think, as tools can provide support for this.
If necessary, it could be done via symlinks.

> As such, this pushes one towards one
> directory per group w/ all versions/types in there -- so long as the
> filename is parsable. [I won't lie to you, I don't know what the right
> solution is, sometimes separating is good, sometimes together is good. I
> lean towards the latter.]

-1. I see the repository as providing support for:
1. end users downloading the latest distributions of projects
2. tools accessing artifact dependencies.

Lumping everything in one directory doesn't make it easier for [1].
See http://www.apache.org/dist/httpd/binaries/ for an example.

> Finally, I suspect there will be "other stuff" (other URLs) within a
> repository that do not revert to a resource URI (e.g. metadata files). I
> suspect we have to be able to programmatically exclude those without
> metadata. (e.g. dot files or all files ending with .xml are excluded, or
> ... )

I take the view that everything in the repository is an artifact.
Tools can exclude the artifacts they don't need - there can't be any
language agnostic support for this, without adding metadata.


View raw message