archiva-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "nicolas de loof" <nicolas.del...@gmail.com>
Subject Re: [Proposal] Repository Layout Detection/Interaction Changes.
Date Tue, 09 Oct 2007 15:25:54 GMT
Let's take a simple example, that breaks in the current archiva beta :

Archiva is configured to use a default-layout managed repo and proxies
central (default configuration)

-  maven1 client requests "maven/plugins/maven-test-plugin-1.9.jar"

Archiva must convert the incoming request to ArtifactReference, to check for
file in the maven repo or fetch the proxies.
The requestes "maven-test-plugin-1.9" must be splitted into [ artifactId :
version : classifier ]
We have no other information on the incoming request.

option 1 :
Joakim suggested to get those datas from the POM. This cannot apply here as
we cannot create the PomReference for the same reason we cannot create the
ArtifactReference.

Joakim also proposed to rework the archiva design and move the
arifactReference resolution to be less central. This would make easier to
plug some resolution strategy from the web UI.

option 2 :
I suggested to use the current regex-based splitter and to check for the
expected file to exist. If not, then the regex may be wrong, then build all
possible [ artifactId : version : classifier ] and check for the file to
exist... (the result of this discovery should be cached somewhere to avoid
network cost).

The issue with the current design is that this artifactId:version:classifier
resolution is done in some util classes that have no acces to the
repository/proxies. The only way to solve this would be to pass an
ArtifactReferenceVerifier as parameter that check the file to exist, but
this may have impact on other archiva components, consumers for example.

I agree with Joakim that the current design is broken, as the
BidirectionLayout interface makes assumption that path to ArtifactReference
is deterministic, and legacy layout is not. So the API must be changed in
any way to reflect this limitation.

option 3 :
Brett suggest to have a dedicated UI to maintain a set of exceptions. That
would be a way to force the LegacyBidirectionalLayout to be deterministic.
This option has less impact on archiva design, but requires
- a new web UI
- some work for the repository manager

Nico.


2007/10/9, Joakim Erdfelt <joakim@erdfelt.com>:
>
> I think you are missing the core point of the proposal.
>
> (Is nicolas the only one that understands the difficulty?)
>
> Using *just* the path information, how do you get from a maven 1 request
> to an artifact reference?
> (groupId, artifactId, version, type)
>
> ch.ethz.ganymed/jars/ganymed-ssh2-build210.jar
> maven/jars/maven-test-plugin-1.8.2.jar
> org.apache.geronimo.specs/jars/geronimo-ejb_2.1_spec-1.0.1.jar
>
> These are some examples of the problems that arise.
> Using that magic regex that you mentioned (which we use in archiva too!)
> we get a 1 to many split from path to reference.
>
> (using "|" syntax for examples of references below
> groupId|artifactId|version|type)
>
> ch.ethz.ganymed/jars/ganymed-ssh2-build210.jar
>    becomes one of the following
> ch.ethz.ganymed|ganymed|ssh2-build210|jar
> ch.ethz.ganymed|ganymed-ssh2|build210|jar
>
> maven/jars/maven-test-plugin-1.8.2.jar
>    becomes one of the following
> maven|maven|test-plugin-1.8.2|jar
> maven|maven-test-plugin|1.8.2|jar
>
> org.apache.geronimo.specs/jars/geronimo-ejb_2.1_spec-1.0.1.jar
>    becomes one of the following
> org.apache.geronimo.specs|geronimo|ejb_2.1_spec-1.0.1|jar
> org.apache.geronimo.specs|geronimo-ejb|2.1_spec-1.0.1|jar
> org.apache.geronimo.specs|geronimo-ejb_2.1|spec-1.0.1|jar
> org.apache.geronimo.specs|geronimo-ejb_2.1_spec|1.0.1|jar
>
> The process to get from maven 1 legacy request to a reference is not
> possible 100% of the time.
> Any for the record, there is no code in maven 1 or maven 2 that does
> this (take a path and get an artifact reference), I've looked dozens of
> times, it just doesn't exist.
>
> The initial proposal was to eliminate this 1 to many problem by reading
> the pom file for the information regarding the groupId / artifactId /
> version, but that isn't a valid solution when working with content that
> needs to be proxied.
>
> - Joakim
>
> Brett Porter wrote:
> > It took me a while to digest, but I think this is being over-thought.
> >
> > I have some questions I need answering before I fully understand:
> > - "Discovering versions in a legacy layout. (do we need metadata
> > update / snapshot purge here?)" -- not sure if it's what you meant,
> > but no I don't think we need these so does that mean the problem isn't
> > relevant?
> > - What are the reporting problems you refer to?
> > - why is classifier not applicable in legacy?
> >
> > For the proposal, it looks ok (with the adjustments Nicolas made), but
> > it also looks like what the code already does?
> >
> > Some more things, just in point form...
> >
> > Since it's problematic, why enumerate the artifact types? In m1, we
> > can determine this purely on the filename extension so it's easy to
> > detect.
> >
> > " It is nearly impossible to detect, using the path alone, the correct
> > artifactId or version" -- to address this in the regexes for the
> > central repository we have a regex and some special cases. I think
> > that might be suitable in this case (even if they have to be hand
> > configured). The ones we have already determined should take care of
> > central. I don't think you can rely on reading the POM - it may not
> > exist, especially if you are mapping to another m1 repository, as you
> > said. I wold be against doing this - it's probably what is causing
> > most of the over-complication I see here.
> >
> > I think the last part is key, as if we aren't re-reading the POM, I
> > believe the code changes you discussed, and the 2 use cases in your
> > second mail are irrelevant, is that right?
> >
> > Just to wrap it up, you are correct about the first use case in your
> > second mail - maven-metadata.xml requests are not in the legacy layout.
> >
> > Cheers,
> > Brett
> >
> >
> > On 05/10/2007, at 11:58 PM, Joakim Erdfelt wrote:
> >
> >>
> >> This is a long email, read it all before commenting, and you'll likely
> >> see a response to your earlier questions. :-)
> >>
> >> I'm currently working on MRM-432 and MRM-519, and I'm in the middle
> >> of an
> >> important change to how Archiva handles Layout detection, interaction,
> >> and parsing.
> >>
> >> :Background:
> >>
> >> Layouts in Archiva have 2 main purposes.
> >>
> >>  1. to convert a path to an artifact reference.
> >>  2. to convert an artifact reference to a path.
> >>
> >> Layouts are used by the following.
> >>
> >>  1. The "/repository/${repoid}/" urls use layouts to determine the
> >>     Artifact Reference that the client is requesting.
> >>     The "/repository/" url is layout neutral, and can have maven 1
> >>     clients ask for content in legacy format, or maven 2 clients ask
> >>     for content in default layout.
> >>  2. Proxy requests out to remote repositories utilize layouts to take
> >>     an internal Artifact Reference, convert it to a path appropriate
> >>     to the remote layout configuration and obtain the content that is
> >>     desired.
> >>  3. Simple Consumers utilize layouts to obtain File references, and
> >>     Artifact References to the repository content for purposes of
> >>     operating on the content in a way that they desire.
> >>  4. Complex consumers (such as metadata updater, and snapshots purge)
> >>     utilize layouts to obtain lists of versions and artifacts.
> >>
> >> What Works.
> >>
> >>  * Converting an Artifact Reference to a path.
> >>  * Discovering Versions in a default layout.
> >>    (needed by metadata update / snapshot purge)
> >>  * Converting a default layout path to an Artifact Reference correctly.
> >>
> >> What Doesn't Work.
> >>
> >>  * Detecting the layout in use 100% of the time.
> >>  * Converting a legacy layout path to an Artifact Reference 100% of
> >>    the time.
> >>  * Discovering versions in a legacy layout.
> >>    (do we need metadata update / snapshot purge here?)
> >>  * Reporting problems correctly.
> >>
> >> :The Problem:
> >>
> >> The inability to parse useful information in a consistent way for all
> >> provided paths.
> >> Gleaning the following information from the path.
> >>
> >>  * Layout Type (default / legacy)
> >>  * Group ID
> >>  * Artifact ID
> >>  * Version (Deployed version & Base version)
> >>  * Classifier (Not applicable in legacy layout)
> >>  * Type (Not the same as Extension)
> >>
> >> Example Paths: (included in this email for discussion, actual list
> >>                from test cases)
> >>
> >> groupId/jars/-1.0.jar
> >> org.apache.maven.test/jars/artifactId-1.0.war
> >> ch.ethz.ganymed/jars/ganymed-ssh2-build210.jar
> >> javax/jars/comm-3.0-u1.jar
> >> javax.persistence/jars/ejb-3.0-public_review.jar
> >> maven/jars/maven-test-plugin-1.8.2.jar
> >> commons-lang/jars/commons-lang-2.1.jar
> >> org.apache.derby/jars/derby-10.2.2.0.jar
> >> com.foo/ejbs/foo-client-1.0.jar
> >> com.foo.lib/javadoc.jars/foo-lib-2.1-alpha-1-javadoc.jar
> >> com.foo.lib/java-sources/foo-lib-2.1-alpha-1-sources.jar
> >> com.foo/jars/foo-tool-1.0.jar
> >> org.apache.geronimo.specs/jars/geronimo-ejb_2.1_spec-1.0.1.jar
> >> directory-clients/poms/ldap-clients-0.9.1-SNAPSHOT.pom
> >> org.apache.archiva.test/jars/redonkulous-
> 3.1-beta-1-20050831.101112-42.jar
> >>
> >> invalid/invalid/1.0-20050611.123456-1/invalid-1.0-20050611.123456-1.jar
> >> ch/ethz/ganymed/ganymed-ssh2/build210/ganymed-ssh2-build210.jar
> >> javax/comm/3.0-u1/comm-3.0-u1.jar
> >> javax/persistence/ejb/3.0-public_review/ejb-3.0-public_review.jar
> >> maven/maven-test-plugin/1.8.2/maven-test-plugin-1.8.2.pom
> >> test/maven-arch/test-arch/2.0.3-SNAPSHOT/test-arch-2.0.3-SNAPSHOT.pom
> >>
> com/company/department/com.company.department/0.2/com.company.department-
> 0.2.pom
> >>
> >>
> com/company/department/com.company.department.project/0.3/com.company.department.project-
> 0.3.pom
> >>
> >> com/foo/foo-tool/1.0/foo-tool-1.0.jar
> >> commons-lang/commons-lang/2.1/commons-lang-2.1.jar
> >> com/foo/foo-client/1.0/foo-client-1.0.jar
> >> com/foo/lib/foo-lib/2.1-alpha-1/foo-lib-2.1-alpha-1-sources.jar
> >> org/apache/archiva/test/redonkulous/3.1-beta-1-SNAPSHOT/redonkulous-
> 3.1-beta-1-20050831.101112-42.jar
> >>
> >>
> >> :Proposal:
> >>
> >> The proposed logic for detecting layout.
> >>
> >>  1. Split path by directory seperators.
> >>  2. If more than 3 parts ( dir/dir/dir/dir/filename ) == default
> layout.
> >>  3. If less than 3 parts ( dir/filename ) == invalid path.
> >>  4. If 3 parts ( dir/dir/filename ) then
> >>     4.1. If part 2 name ends in "s" then test for potential legacy
> >> layout.
> >>          4.1.1. Identify filename extension.
> >>          4.1.2. Get potential list of artifact types for extension.
> >>          4.1.3. If part 2 (minus the end "s")  is in the list of
> >>                 artifact types == legacy layout
> >>     4.2. Can't be legacy, then hand off to default layout.
> >>
> >>  The problem with this approach is maintaining the list of extensions
> to
> >>  artifact type.  The artifact type is arbitrary, and can be expanded
> >>  upon by the user to include types that we can't even imagine today.
> >>  (See MRM-481: issue with extension .xml.zip)
> >>
> >> The proposed logic for parsing default layout paths.
> >>
> >>  This one is easy.  paths are in the following format ...
> >>
>
> >>  ${groupId}/${artifactId}/${baseVersion}/${artifactId}-${version}-${classifier}.${type}
> >>
> >>
> >>  Once we seperate out the directories from the filename, we get the
> >>  following order.
> >>
> >>  dirs[dirs.length] = base version.
> >>  dirs[dirs.length-1] = artifact Id.
> >>  dirs[0] thru dirs[dirs.length-2] = groupId.
> >>
> >>  That gives us the crucial pieces in the filename
> >>  ${artifactId}-${version}, which makes detecting the classifier and
> >>  type easy enough.
> >>
> >> The proposed logic for parsing legacy layout paths.
> >>
> >>  Legacy layouts are tricky.  It is nearly impossible to detect, using
> >>  the path alone, the correct artifactId or version.  So the process
> >>  will need to read the pom file associated with the artifact Id in
> order
> >>  to determine the correct Artifact Reference pieces.
> >>
> >>  The problem with this approach is that we now need 2 pieces of
> >>  information, the repository root (location or url) and the path.
> >>  Plus we incur a hit / read of the pom file.
> >>
> >>  So, if we use the pseudo-pattern ...
> >>  [:groupId:]/[:type:]s/[:filename:].[:ext:]
> >>  as a starting point, swap out the [:type:] and [:ext:] for "pom" and
> >>  load the pom from the actual repository to determine the groupId,
> >>  artifactId, and version, we can then have an valid Artifact Reference.
> >>
> >>  The problem with relying on the pom is that it is now required for
> >>  legacy layout "from path" logic, this changes the assumption that poms
> >>  are optional and not required, as well as changing the interface
> >>  to the layout objects to needing a repository as well.
> >>
> >> The proposed changes to the codebase.
> >>
> >>  * Eliminate RepositoryLayoutUtils, roll layout specific filename
> >>    parsing routines into their respected layouts.
> >>  * Eliminate direct usage of BidirectionalRepositoryLayout by
> >>    consumers.
> >>  * Create RepositoryContentRequest that takes the freeform requests
> >>    arriving in from the "/repository/" urls and puts it through
> >>    the logic as outlined above.
> >>  * Rename BidirectionalRepositoryLayout interface to RepositoryContent
> >>    to simplify name and represent new role of accessing repository
> >>    content that requires a repository reference.
> >>  * Create DefaultRepositoryContent and LegacyRepositoryContent
> >>    implementations, that utilize techniques described above, and
> >>    logic already present in DefaultBidirectionalRepositoryLayout and
> >>    LegacayBidirectionalRepositoryLayout.
> >>  * Create AnonymousProjectReader that takes a File object pointing to
> >>    a pom, read the <pomVersion> or <modelVersion> elements and load
> >>    the pom information as appropriate.
> >>  * Create RepositoryContentFactory that returns a RepositoryContent
> >>    implementation for the provided repository id.
> >>
> >> Example of new RepositoryContent interface.
> >>
> >> --(snip)--
> >> package org.apache.maven.archiva.repository;
> >>
> >> /*
> >> * Licensed to the Apache Software Foundation (ASF) under one
> >> * or more contributor license agreements.  See the NOTICE file
> >> * distributed with this work for additional information
> >> * regarding copyright ownership.  The ASF licenses this file
> >> * to you under the Apache License, Version 2.0 (the
> >> * "License"); you may not use this file except in compliance
> >> * with the License.  You may obtain a copy of the License at
> >> *
> >> *  http://www.apache.org/licenses/LICENSE-2.0
> >> *
> >> * Unless required by applicable law or agreed to in writing,
> >> * software distributed under the License is distributed on an
> >> * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
> >> * KIND, either express or implied.  See the License for the
> >> * specific language governing permissions and limitations
> >> * under the License.
> >> */
> >>
> >> import org.apache.maven.archiva.model.ArtifactReference;
> >> import org.apache.maven.archiva.model.ProjectReference;
> >> import org.apache.maven.archiva.model.VersionedReference;
> >> import org.apache.maven.archiva.repository.layout.LayoutException;
> >>
> >> import java.util.List;
> >>
> >> /**
> >> * RepositoryContent interface for interacting with a managed repository
> >> * in an abstract way, without the need for processing based on
> >> * filesystem paths, or working with the database.
> >> *
> >> * @author <a href="mailto:joakim@erdfelt.com">Joakim Erdfelt</a>
> >> * @version $Id$
> >> */
> >> public interface RepositoryContent
> >> {
> >>    /**
> >>     * Determines if the project referenced exists in the repository.
> >>     *
> >>     * @param reference the project reference to check for.
> >>     * @return true it the project referenced exists.
> >>     */
> >>    public boolean hasContent( ProjectReference reference );
> >>
> >>    /**
> >>     * Determines if the version reference exists in the repository.
> >>     *
> >>     * @param reference the version reference to check for.
> >>     * @return true if the version referenced exists.
> >>     */
> >>    public boolean hasContent( VersionedReference reference );
> >>
> >>    /**
> >>     * Determines if the artifact referenced exists in the repository.
> >>     *
> >>     * @param reference the artifact reference to check for.
> >>     * @return true if the artifact referenced exists.
> >>     */
> >>    public boolean hasContent( ArtifactReference reference );
> >>
> >>    /**
> >>     * Given a repository relative path to a filename, return the
> >>     * {@link VersionedReference} object suitable for the path.
> >>     *
> >>     * @param path the path relative to the repository base dir for
> >>     *        the artifact.
> >>     * @return the {@link ArtifactReference} representing the path.
> >>     *        (or null if path cannot be converted to a
> >>     *        {@link ArtifactReference})
> >>     * @throws LayoutException if there was a problem converting the
> >>     *         path to an artifact.
> >>     */
> >>    public ArtifactReference toArtifactReference( String path );
> >>
> >>    /**
> >>     * Given an ArtifactReference, return the relative path to the
> >>     * artifact.
> >>     *
> >>     * @param reference the artifact reference to use.
> >>     * @return the relative path to the artifact.
> >>     */
> >>    public String toPath( ArtifactReference reference );
> >>
> >>    /**
> >>     * Given an ArtifactReference, return the file reference to the
> >>     * artifact.
> >>     *
> >>     * @param reference the artifact reference to use.
> >>     * @return the relative path to the artifact.
> >>     */
> >>    public File toFile( ArtifactReference reference );
> >>
> >>    /**
> >>     * Given an ArtifactReference, return the url to the artifact.
> >>     *
> >>     * @param reference the artifact reference to use.
> >>     * @return the relative path to the artifact.
> >>     */
> >>    public URL toURL( ArtifactReference reference );
> >>
> >>
> >>    /**
> >>     * Gather up the list of related artifacts to the ArtifactReference
> >>     * provided. This typically inclues the pom files, and those things
> >>     * with classifiers (such as doc, source code, test libs, etc...)
> >>     *
> >>     * NOTE: Some layouts (such as maven 1 "legacy"), and remote
> >>     * repositories are not compatible with this query.
> >>     *
> >>     * @param reference the reference to work off of.
> >>     * @return the list of ArtifactReferences for related artifacts.
> >>     * @throws ContentNotFoundException if the initial artifact
> reference
> >>     *         does not exist within the repository.
> >>     */
> >>    public List<ArtifactReference> getRelatedArtifacts(
> >>        ArtifactReference reference )
> >>        throws ContentNotFoundException, NotSupportedException;
> >>
> >>    /**
> >>     * Given a specific VersionedReference, return the list of available
> >>     * versions for that versioned reference.
> >>     *
> >>     * NOTE: This is really only useful when working with SNAPSHOTs.
> >>     *       Not compatible with remote repositories.
> >>     *
> >>     * @param reference the versioned reference to work off of.
> >>     * @return the list of versions found.
> >>     * @throws ContentNotFoundException if the versioned reference does
> >>     *         not exist within the repository.
> >>     */
> >>    public List<String> getVersions( VersionedReference reference )
> >>        throws ContentNotFoundException, NotSupportedException;
> >>
> >>    /**
> >>     * Given a specific ProjectReference, return the list of available
> >>     * versions for that project reference.
> >>     *
> >>     * @param reference the project reference to work off of.
> >>     * @return the list of versions found for that project reference.
> >>     * @throws ContentNotFoundException if the project reference does
> not
> >>     *         exist within the repository.
> >>     */
> >>    public List<String> getVersions( ProjectReference reference )
> >>        throws ContentNotFoundException, NotSupportedException;
> >> }
> >> --(snip)--
> >>
> >> I feel this is a better long term solution for the persistent layout
> >> parsing issues we have within Archiva.  However not all of the problems
> >> have been solved.  I've outlined the ones that need help above in this
> >> email, but I'm sure there are ones that have been overlooked.
> >>
> >> Disclaimer: Yes, this is in the form of a proposal, but I'm already
> >> working on this, and will continue down this path unless
> >> someone here has a strong objection about this approach.
> >>
> >> --
> >> - Joakim Erdfelt
> >>  Committer and PMC Member, Apache Maven
> >>  Archiva Developer
> >>  joakime@apache.org
> >
> > --
> > Brett Porter - brett@apache.org
> > Blog: http://www.devzuz.org/blogs/bporter/
> >
>
>
> --
> - Joakim Erdfelt
>   joakim@erdfelt.com
>   Open Source Software (OSS) Developer
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message