archiva-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joakim Erdfelt <joa...@erdfelt.com>
Subject Re: [Proposal] Repository Layout Detection/Interaction Changes.
Date Fri, 05 Oct 2007 22:19:08 GMT
A few more use cases to worry about.

Use Case 1:
Request on "/repository/" url for maven-metadata.xml.

Problem: Assume Default Layout for all maven-metadata.xml requests?

Use Case 2:
Request on "/repository/" url using maven 1 legacy style for content 
that doesn't exist and needs to arrive via proxy connector from remote 
repository?

Problem: Can't parse pom (we don't have it yet) to get ArtifactReference 
in order to make request to remote repository

Use Case 3:
Request on "/repository/" url using maven 1 legacy style, but for 
content that is stored in default layout.

Problem: Catch-22: Need to parse path to Artifact Reference in order to 
load pom, in order to know the Artifact Reference to the pom to load.

- Joakim

If a request arrives into the "/repository/" url tha

Joakim Erdfelt wrote:
>
> This is a long email, read it all before commenting, and you'll likely
> see a response to your earlier questions. :-)
>
> I'm currently working on MRM-432 and MRM-519, and I'm in the middle of an
> important change to how Archiva handles Layout detection, interaction,
> and parsing.
>
> :Background:
>
> Layouts in Archiva have 2 main purposes.
>
>  1. to convert a path to an artifact reference.
>  2. to convert an artifact reference to a path.
>
> Layouts are used by the following.
>
>  1. The "/repository/${repoid}/" urls use layouts to determine the
>     Artifact Reference that the client is requesting.
>     The "/repository/" url is layout neutral, and can have maven 1
>     clients ask for content in legacy format, or maven 2 clients ask
>     for content in default layout.
>  2. Proxy requests out to remote repositories utilize layouts to take
>     an internal Artifact Reference, convert it to a path appropriate
>     to the remote layout configuration and obtain the content that is
>     desired.
>  3. Simple Consumers utilize layouts to obtain File references, and
>     Artifact References to the repository content for purposes of
>     operating on the content in a way that they desire.
>  4. Complex consumers (such as metadata updater, and snapshots purge)
>     utilize layouts to obtain lists of versions and artifacts.
>
> What Works.
>
>  * Converting an Artifact Reference to a path.
>  * Discovering Versions in a default layout.
>    (needed by metadata update / snapshot purge)
>  * Converting a default layout path to an Artifact Reference correctly.
>
> What Doesn't Work.
>
>  * Detecting the layout in use 100% of the time.
>  * Converting a legacy layout path to an Artifact Reference 100% of
>    the time.
>  * Discovering versions in a legacy layout.
>    (do we need metadata update / snapshot purge here?)
>  * Reporting problems correctly.
>
> :The Problem:
>
> The inability to parse useful information in a consistent way for all
> provided paths.
> Gleaning the following information from the path.
>
>  * Layout Type (default / legacy)
>  * Group ID
>  * Artifact ID
>  * Version (Deployed version & Base version)
>  * Classifier (Not applicable in legacy layout)
>  * Type (Not the same as Extension)
>
> Example Paths: (included in this email for discussion, actual list
>                from test cases)
>
> groupId/jars/-1.0.jar
> org.apache.maven.test/jars/artifactId-1.0.war
> ch.ethz.ganymed/jars/ganymed-ssh2-build210.jar
> javax/jars/comm-3.0-u1.jar
> javax.persistence/jars/ejb-3.0-public_review.jar
> maven/jars/maven-test-plugin-1.8.2.jar
> commons-lang/jars/commons-lang-2.1.jar
> org.apache.derby/jars/derby-10.2.2.0.jar
> com.foo/ejbs/foo-client-1.0.jar
> com.foo.lib/javadoc.jars/foo-lib-2.1-alpha-1-javadoc.jar
> com.foo.lib/java-sources/foo-lib-2.1-alpha-1-sources.jar
> com.foo/jars/foo-tool-1.0.jar
> org.apache.geronimo.specs/jars/geronimo-ejb_2.1_spec-1.0.1.jar
> directory-clients/poms/ldap-clients-0.9.1-SNAPSHOT.pom
> org.apache.archiva.test/jars/redonkulous-3.1-beta-1-20050831.101112-42.jar 
>
> invalid/invalid/1.0-20050611.123456-1/invalid-1.0-20050611.123456-1.jar
> ch/ethz/ganymed/ganymed-ssh2/build210/ganymed-ssh2-build210.jar
> javax/comm/3.0-u1/comm-3.0-u1.jar
> javax/persistence/ejb/3.0-public_review/ejb-3.0-public_review.jar
> maven/maven-test-plugin/1.8.2/maven-test-plugin-1.8.2.pom
> test/maven-arch/test-arch/2.0.3-SNAPSHOT/test-arch-2.0.3-SNAPSHOT.pom
> com/company/department/com.company.department/0.2/com.company.department-0.2.pom 
>
> com/company/department/com.company.department.project/0.3/com.company.department.project-0.3.pom

>
> com/foo/foo-tool/1.0/foo-tool-1.0.jar
> commons-lang/commons-lang/2.1/commons-lang-2.1.jar
> com/foo/foo-client/1.0/foo-client-1.0.jar
> com/foo/lib/foo-lib/2.1-alpha-1/foo-lib-2.1-alpha-1-sources.jar
> org/apache/archiva/test/redonkulous/3.1-beta-1-SNAPSHOT/redonkulous-3.1-beta-1-20050831.101112-42.jar

>
>
> :Proposal:
>
> The proposed logic for detecting layout.
>
>  1. Split path by directory seperators.
>  2. If more than 3 parts ( dir/dir/dir/dir/filename ) == default layout.
>  3. If less than 3 parts ( dir/filename ) == invalid path.
>  4. If 3 parts ( dir/dir/filename ) then
>     4.1. If part 2 name ends in "s" then test for potential legacy 
> layout.
>          4.1.1. Identify filename extension.
>          4.1.2. Get potential list of artifact types for extension.
>          4.1.3. If part 2 (minus the end "s")  is in the list of
>                 artifact types == legacy layout
>     4.2. Can't be legacy, then hand off to default layout.
>
>  The problem with this approach is maintaining the list of extensions to
>  artifact type.  The artifact type is arbitrary, and can be expanded
>  upon by the user to include types that we can't even imagine today.
>  (See MRM-481: issue with extension .xml.zip)
>
> The proposed logic for parsing default layout paths.
>
>  This one is easy.  paths are in the following format ...
>
>  
> ${groupId}/${artifactId}/${baseVersion}/${artifactId}-${version}-${classifier}.${type}

>
>
>  Once we seperate out the directories from the filename, we get the
>  following order.
>
>  dirs[dirs.length] = base version.
>  dirs[dirs.length-1] = artifact Id.
>  dirs[0] thru dirs[dirs.length-2] = groupId.
>
>  That gives us the crucial pieces in the filename
>  ${artifactId}-${version}, which makes detecting the classifier and
>  type easy enough.
>
> The proposed logic for parsing legacy layout paths.
>
>  Legacy layouts are tricky.  It is nearly impossible to detect, using
>  the path alone, the correct artifactId or version.  So the process
>  will need to read the pom file associated with the artifact Id in order
>  to determine the correct Artifact Reference pieces.
>
>  The problem with this approach is that we now need 2 pieces of
>  information, the repository root (location or url) and the path.
>  Plus we incur a hit / read of the pom file.
>
>  So, if we use the pseudo-pattern ...
>  [:groupId:]/[:type:]s/[:filename:].[:ext:]
>  as a starting point, swap out the [:type:] and [:ext:] for "pom" and
>  load the pom from the actual repository to determine the groupId,
>  artifactId, and version, we can then have an valid Artifact Reference.
>
>  The problem with relying on the pom is that it is now required for
>  legacy layout "from path" logic, this changes the assumption that poms
>  are optional and not required, as well as changing the interface
>  to the layout objects to needing a repository as well.
>
> The proposed changes to the codebase.
>
>  * Eliminate RepositoryLayoutUtils, roll layout specific filename
>    parsing routines into their respected layouts.
>  * Eliminate direct usage of BidirectionalRepositoryLayout by
>    consumers.
>  * Create RepositoryContentRequest that takes the freeform requests
>    arriving in from the "/repository/" urls and puts it through
>    the logic as outlined above.
>  * Rename BidirectionalRepositoryLayout interface to RepositoryContent
>    to simplify name and represent new role of accessing repository
>    content that requires a repository reference.
>  * Create DefaultRepositoryContent and LegacyRepositoryContent
>    implementations, that utilize techniques described above, and
>    logic already present in DefaultBidirectionalRepositoryLayout and
>    LegacayBidirectionalRepositoryLayout.
>  * Create AnonymousProjectReader that takes a File object pointing to
>    a pom, read the <pomVersion> or <modelVersion> elements and load
>    the pom information as appropriate.
>  * Create RepositoryContentFactory that returns a RepositoryContent
>    implementation for the provided repository id.
>
> Example of new RepositoryContent interface.
>
> --(snip)--
> package org.apache.maven.archiva.repository;
>
> /*
> * Licensed to the Apache Software Foundation (ASF) under one
> * or more contributor license agreements.  See the NOTICE file
> * distributed with this work for additional information
> * regarding copyright ownership.  The ASF licenses this file
> * to you under the Apache License, Version 2.0 (the
> * "License"); you may not use this file except in compliance
> * with the License.  You may obtain a copy of the License at
> *
> *  http://www.apache.org/licenses/LICENSE-2.0
> *
> * Unless required by applicable law or agreed to in writing,
> * software distributed under the License is distributed on an
> * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
> * KIND, either express or implied.  See the License for the
> * specific language governing permissions and limitations
> * under the License.
> */
>
> import org.apache.maven.archiva.model.ArtifactReference;
> import org.apache.maven.archiva.model.ProjectReference;
> import org.apache.maven.archiva.model.VersionedReference;
> import org.apache.maven.archiva.repository.layout.LayoutException;
>
> import java.util.List;
>
> /**
> * RepositoryContent interface for interacting with a managed repository
> * in an abstract way, without the need for processing based on
> * filesystem paths, or working with the database.
> *
> * @author <a href="mailto:joakim@erdfelt.com">Joakim Erdfelt</a>
> * @version $Id$
> */
> public interface RepositoryContent
> {
>    /**
>     * Determines if the project referenced exists in the repository.
>     *
>     * @param reference the project reference to check for.
>     * @return true it the project referenced exists.
>     */
>    public boolean hasContent( ProjectReference reference );
>
>    /**
>     * Determines if the version reference exists in the repository.
>     *
>     * @param reference the version reference to check for.
>     * @return true if the version referenced exists.
>     */
>    public boolean hasContent( VersionedReference reference );
>
>    /**
>     * Determines if the artifact referenced exists in the repository.
>     *
>     * @param reference the artifact reference to check for.
>     * @return true if the artifact referenced exists.
>     */
>    public boolean hasContent( ArtifactReference reference );
>
>    /**
>     * Given a repository relative path to a filename, return the
>     * {@link VersionedReference} object suitable for the path.
>     *
>     * @param path the path relative to the repository base dir for
>     *        the artifact.
>     * @return the {@link ArtifactReference} representing the path.
>     *        (or null if path cannot be converted to a
>     *        {@link ArtifactReference})
>     * @throws LayoutException if there was a problem converting the
>     *         path to an artifact.
>     */
>    public ArtifactReference toArtifactReference( String path );
>
>    /**
>     * Given an ArtifactReference, return the relative path to the
>     * artifact.
>     *
>     * @param reference the artifact reference to use.
>     * @return the relative path to the artifact.
>     */
>    public String toPath( ArtifactReference reference );
>
>    /**
>     * Given an ArtifactReference, return the file reference to the
>     * artifact.
>     *
>     * @param reference the artifact reference to use.
>     * @return the relative path to the artifact.
>     */
>    public File toFile( ArtifactReference reference );
>
>    /**
>     * Given an ArtifactReference, return the url to the artifact.
>     *
>     * @param reference the artifact reference to use.
>     * @return the relative path to the artifact.
>     */
>    public URL toURL( ArtifactReference reference );
>
>
>    /**
>     * Gather up the list of related artifacts to the ArtifactReference
>     * provided. This typically inclues the pom files, and those things
>     * with classifiers (such as doc, source code, test libs, etc...)
>     *
>     * NOTE: Some layouts (such as maven 1 "legacy"), and remote
>     * repositories are not compatible with this query.
>     *
>     * @param reference the reference to work off of.
>     * @return the list of ArtifactReferences for related artifacts.
>     * @throws ContentNotFoundException if the initial artifact reference
>     *         does not exist within the repository.
>     */
>    public List<ArtifactReference> getRelatedArtifacts(
>        ArtifactReference reference )
>        throws ContentNotFoundException, NotSupportedException;
>
>    /**
>     * Given a specific VersionedReference, return the list of available
>     * versions for that versioned reference.
>     *
>     * NOTE: This is really only useful when working with SNAPSHOTs.
>     *       Not compatible with remote repositories.
>     *
>     * @param reference the versioned reference to work off of.
>     * @return the list of versions found.
>     * @throws ContentNotFoundException if the versioned reference does
>     *         not exist within the repository.
>     */
>    public List<String> getVersions( VersionedReference reference )
>        throws ContentNotFoundException, NotSupportedException;
>
>    /**
>     * Given a specific ProjectReference, return the list of available
>     * versions for that project reference.
>     *
>     * @param reference the project reference to work off of.
>     * @return the list of versions found for that project reference.
>     * @throws ContentNotFoundException if the project reference does not
>     *         exist within the repository.
>     */
>    public List<String> getVersions( ProjectReference reference )
>        throws ContentNotFoundException, NotSupportedException;
> }
> --(snip)--
>
> I feel this is a better long term solution for the persistent layout
> parsing issues we have within Archiva.  However not all of the problems
> have been solved.  I've outlined the ones that need help above in this
> email, but I'm sure there are ones that have been overlooked.
>
> Disclaimer: Yes, this is in the form of a proposal, but I'm already
> working on this, and will continue down this path unless
> someone here has a strong objection about this approach.
>


-- 
- Joakim Erdfelt
  joakim@erdfelt.com
  Open Source Software (OSS) Developer


Mime
View raw message