archiva-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joakim Erdfelt <joak...@apache.org>
Subject [Proposal] Repository Layout Detection/Interaction Changes.
Date Fri, 05 Oct 2007 21:58:59 GMT

This is a long email, read it all before commenting, and you'll likely
see a response to your earlier questions. :-)

I'm currently working on MRM-432 and MRM-519, and I'm in the middle of an
important change to how Archiva handles Layout detection, interaction,
and parsing.

:Background:

Layouts in Archiva have 2 main purposes.

  1. to convert a path to an artifact reference.
  2. to convert an artifact reference to a path.

Layouts are used by the following.

  1. The "/repository/${repoid}/" urls use layouts to determine the
     Artifact Reference that the client is requesting.
     The "/repository/" url is layout neutral, and can have maven 1
     clients ask for content in legacy format, or maven 2 clients ask
     for content in default layout.
  2. Proxy requests out to remote repositories utilize layouts to take
     an internal Artifact Reference, convert it to a path appropriate
     to the remote layout configuration and obtain the content that is
     desired.
  3. Simple Consumers utilize layouts to obtain File references, and
     Artifact References to the repository content for purposes of
     operating on the content in a way that they desire.
  4. Complex consumers (such as metadata updater, and snapshots purge)
     utilize layouts to obtain lists of versions and artifacts.

What Works.

  * Converting an Artifact Reference to a path.
  * Discovering Versions in a default layout.
    (needed by metadata update / snapshot purge)
  * Converting a default layout path to an Artifact Reference correctly.

What Doesn't Work.

  * Detecting the layout in use 100% of the time.
  * Converting a legacy layout path to an Artifact Reference 100% of
    the time.
  * Discovering versions in a legacy layout.
    (do we need metadata update / snapshot purge here?)
  * Reporting problems correctly.

:The Problem:

The inability to parse useful information in a consistent way for all
provided paths.
Gleaning the following information from the path.

  * Layout Type (default / legacy)
  * Group ID
  * Artifact ID
  * Version (Deployed version & Base version)
  * Classifier (Not applicable in legacy layout)
  * Type (Not the same as Extension)

Example Paths: (included in this email for discussion, actual list
                from test cases)

groupId/jars/-1.0.jar
org.apache.maven.test/jars/artifactId-1.0.war
ch.ethz.ganymed/jars/ganymed-ssh2-build210.jar
javax/jars/comm-3.0-u1.jar
javax.persistence/jars/ejb-3.0-public_review.jar
maven/jars/maven-test-plugin-1.8.2.jar
commons-lang/jars/commons-lang-2.1.jar
org.apache.derby/jars/derby-10.2.2.0.jar
com.foo/ejbs/foo-client-1.0.jar
com.foo.lib/javadoc.jars/foo-lib-2.1-alpha-1-javadoc.jar
com.foo.lib/java-sources/foo-lib-2.1-alpha-1-sources.jar
com.foo/jars/foo-tool-1.0.jar
org.apache.geronimo.specs/jars/geronimo-ejb_2.1_spec-1.0.1.jar
directory-clients/poms/ldap-clients-0.9.1-SNAPSHOT.pom
org.apache.archiva.test/jars/redonkulous-3.1-beta-1-20050831.101112-42.jar
invalid/invalid/1.0-20050611.123456-1/invalid-1.0-20050611.123456-1.jar
ch/ethz/ganymed/ganymed-ssh2/build210/ganymed-ssh2-build210.jar
javax/comm/3.0-u1/comm-3.0-u1.jar
javax/persistence/ejb/3.0-public_review/ejb-3.0-public_review.jar
maven/maven-test-plugin/1.8.2/maven-test-plugin-1.8.2.pom
test/maven-arch/test-arch/2.0.3-SNAPSHOT/test-arch-2.0.3-SNAPSHOT.pom
com/company/department/com.company.department/0.2/com.company.department-0.2.pom
com/company/department/com.company.department.project/0.3/com.company.department.project-0.3.pom
com/foo/foo-tool/1.0/foo-tool-1.0.jar
commons-lang/commons-lang/2.1/commons-lang-2.1.jar
com/foo/foo-client/1.0/foo-client-1.0.jar
com/foo/lib/foo-lib/2.1-alpha-1/foo-lib-2.1-alpha-1-sources.jar
org/apache/archiva/test/redonkulous/3.1-beta-1-SNAPSHOT/redonkulous-3.1-beta-1-20050831.101112-42.jar

:Proposal:

The proposed logic for detecting layout.

  1. Split path by directory seperators.
  2. If more than 3 parts ( dir/dir/dir/dir/filename ) == default layout.
  3. If less than 3 parts ( dir/filename ) == invalid path.
  4. If 3 parts ( dir/dir/filename ) then
     4.1. If part 2 name ends in "s" then test for potential legacy layout.
          4.1.1. Identify filename extension.
          4.1.2. Get potential list of artifact types for extension.
          4.1.3. If part 2 (minus the end "s")  is in the list of
                 artifact types == legacy layout
     4.2. Can't be legacy, then hand off to default layout.

  The problem with this approach is maintaining the list of extensions to
  artifact type.  The artifact type is arbitrary, and can be expanded
  upon by the user to include types that we can't even imagine today.
  (See MRM-481: issue with extension .xml.zip)

The proposed logic for parsing default layout paths.

  This one is easy.  paths are in the following format ...

  
${groupId}/${artifactId}/${baseVersion}/${artifactId}-${version}-${classifier}.${type}

  Once we seperate out the directories from the filename, we get the
  following order.

  dirs[dirs.length] = base version.
  dirs[dirs.length-1] = artifact Id.
  dirs[0] thru dirs[dirs.length-2] = groupId.

  That gives us the crucial pieces in the filename
  ${artifactId}-${version}, which makes detecting the classifier and
  type easy enough.

The proposed logic for parsing legacy layout paths.

  Legacy layouts are tricky.  It is nearly impossible to detect, using
  the path alone, the correct artifactId or version.  So the process
  will need to read the pom file associated with the artifact Id in order
  to determine the correct Artifact Reference pieces.

  The problem with this approach is that we now need 2 pieces of
  information, the repository root (location or url) and the path.
  Plus we incur a hit / read of the pom file.

  So, if we use the pseudo-pattern ...
  [:groupId:]/[:type:]s/[:filename:].[:ext:]
  as a starting point, swap out the [:type:] and [:ext:] for "pom" and
  load the pom from the actual repository to determine the groupId,
  artifactId, and version, we can then have an valid Artifact Reference.

  The problem with relying on the pom is that it is now required for
  legacy layout "from path" logic, this changes the assumption that poms
  are optional and not required, as well as changing the interface
  to the layout objects to needing a repository as well.

The proposed changes to the codebase.

  * Eliminate RepositoryLayoutUtils, roll layout specific filename
    parsing routines into their respected layouts.
  * Eliminate direct usage of BidirectionalRepositoryLayout by
    consumers.
  * Create RepositoryContentRequest that takes the freeform requests
    arriving in from the "/repository/" urls and puts it through
    the logic as outlined above.
  * Rename BidirectionalRepositoryLayout interface to RepositoryContent
    to simplify name and represent new role of accessing repository
    content that requires a repository reference.
  * Create DefaultRepositoryContent and LegacyRepositoryContent
    implementations, that utilize techniques described above, and
    logic already present in DefaultBidirectionalRepositoryLayout and
    LegacayBidirectionalRepositoryLayout.
  * Create AnonymousProjectReader that takes a File object pointing to
    a pom, read the <pomVersion> or <modelVersion> elements and load
    the pom information as appropriate.
  * Create RepositoryContentFactory that returns a RepositoryContent
    implementation for the provided repository id.

Example of new RepositoryContent interface.

--(snip)--
package org.apache.maven.archiva.repository;

/*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *  http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing,
 * software distributed under the License is distributed on an
 * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 * KIND, either express or implied.  See the License for the
 * specific language governing permissions and limitations
 * under the License.
 */

import org.apache.maven.archiva.model.ArtifactReference;
import org.apache.maven.archiva.model.ProjectReference;
import org.apache.maven.archiva.model.VersionedReference;
import org.apache.maven.archiva.repository.layout.LayoutException;

import java.util.List;

/**
 * RepositoryContent interface for interacting with a managed repository
 * in an abstract way, without the need for processing based on
 * filesystem paths, or working with the database.
 *
 * @author <a href="mailto:joakim@erdfelt.com">Joakim Erdfelt</a>
 * @version $Id$
 */
public interface RepositoryContent
{
    /**
     * Determines if the project referenced exists in the repository.
     *
     * @param reference the project reference to check for.
     * @return true it the project referenced exists.
     */
    public boolean hasContent( ProjectReference reference );

    /**
     * Determines if the version reference exists in the repository.
     *
     * @param reference the version reference to check for.
     * @return true if the version referenced exists.
     */
    public boolean hasContent( VersionedReference reference );

    /**
     * Determines if the artifact referenced exists in the repository.
     *
     * @param reference the artifact reference to check for.
     * @return true if the artifact referenced exists.
     */
    public boolean hasContent( ArtifactReference reference );

    /**
     * Given a repository relative path to a filename, return the
     * {@link VersionedReference} object suitable for the path.
     *
     * @param path the path relative to the repository base dir for
     *        the artifact.
     * @return the {@link ArtifactReference} representing the path.
     *        (or null if path cannot be converted to a
     *        {@link ArtifactReference})
     * @throws LayoutException if there was a problem converting the
     *         path to an artifact.
     */
    public ArtifactReference toArtifactReference( String path );

    /**
     * Given an ArtifactReference, return the relative path to the
     * artifact.
     *
     * @param reference the artifact reference to use.
     * @return the relative path to the artifact.
     */
    public String toPath( ArtifactReference reference );

    /**
     * Given an ArtifactReference, return the file reference to the
     * artifact.
     *
     * @param reference the artifact reference to use.
     * @return the relative path to the artifact.
     */
    public File toFile( ArtifactReference reference );

    /**
     * Given an ArtifactReference, return the url to the artifact.
     *
     * @param reference the artifact reference to use.
     * @return the relative path to the artifact.
     */
    public URL toURL( ArtifactReference reference );


    /**
     * Gather up the list of related artifacts to the ArtifactReference
     * provided. This typically inclues the pom files, and those things
     * with classifiers (such as doc, source code, test libs, etc...)
     *
     * NOTE: Some layouts (such as maven 1 "legacy"), and remote
     * repositories are not compatible with this query.
     *
     * @param reference the reference to work off of.
     * @return the list of ArtifactReferences for related artifacts.
     * @throws ContentNotFoundException if the initial artifact reference
     *         does not exist within the repository.
     */
    public List<ArtifactReference> getRelatedArtifacts(
        ArtifactReference reference )
        throws ContentNotFoundException, NotSupportedException;

    /**
     * Given a specific VersionedReference, return the list of available
     * versions for that versioned reference.
     *
     * NOTE: This is really only useful when working with SNAPSHOTs.
     *       Not compatible with remote repositories.
     *
     * @param reference the versioned reference to work off of.
     * @return the list of versions found.
     * @throws ContentNotFoundException if the versioned reference does
     *         not exist within the repository.
     */
    public List<String> getVersions( VersionedReference reference )
        throws ContentNotFoundException, NotSupportedException;

    /**
     * Given a specific ProjectReference, return the list of available
     * versions for that project reference.
     *
     * @param reference the project reference to work off of.
     * @return the list of versions found for that project reference.
     * @throws ContentNotFoundException if the project reference does not
     *         exist within the repository.
     */
    public List<String> getVersions( ProjectReference reference )
        throws ContentNotFoundException, NotSupportedException;
}
--(snip)--

I feel this is a better long term solution for the persistent layout
parsing issues we have within Archiva.  However not all of the problems
have been solved.  I've outlined the ones that need help above in this
email, but I'm sure there are ones that have been overlooked.

Disclaimer: Yes, this is in the form of a proposal, but I'm already
working on this, and will continue down this path unless
someone here has a strong objection about this approach.

-- 
- Joakim Erdfelt
  Committer and PMC Member, Apache Maven
  Archiva Developer
  joakime@apache.org


Mime
View raw message