archiva-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joakim Erdfelt <joa...@erdfelt.com>
Subject Re: [Proposal] Repository Layout Detection/Interaction Changes.
Date Sat, 06 Oct 2007 15:15:34 GMT
Hmm, You are correct.

Shortest path I can think of is junit/junit/3.8.1/junit-3.8.1.jar
That would be 4 parts, no?

>  4. If 3 parts ( dir/dir/filename ) then
>     4.1. If part 2 name ends in "s" then test for potential legacy layout.
>          4.1.1. Identify filename extension.
>          4.1.2. Get potential list of artifact types for extension.
>          4.1.3. If part 2 (minus the end "s")  is in the list of
>                 artifact types == legacy layout
>     4.2. Can't be legacy, then hand off to default layout.

Lets change 4.2 to read ...
4.2.  Invalid legacy layout.

- Joakim

nicolas de loof wrote:
> Just on question about the proposed logic for detecting layout.
>
> 4 If 3 parts ( dir/dir/filename ) then
> ...
>
> Is there any case where a 3 part path can be a maven2 path ???
>
> Nico.
>
> 2007/10/5, Joakim Erdfelt <joakime@apache.org>:
>   
>> This is a long email, read it all before commenting, and you'll likely
>> see a response to your earlier questions. :-)
>>
>> I'm currently working on MRM-432 and MRM-519, and I'm in the middle of an
>> important change to how Archiva handles Layout detection, interaction,
>> and parsing.
>>
>> :Background:
>>
>> Layouts in Archiva have 2 main purposes.
>>
>> 1. to convert a path to an artifact reference.
>> 2. to convert an artifact reference to a path.
>>
>> Layouts are used by the following.
>>
>> 1. The "/repository/${repoid}/" urls use layouts to determine the
>>     Artifact Reference that the client is requesting.
>>     The "/repository/" url is layout neutral, and can have maven 1
>>     clients ask for content in legacy format, or maven 2 clients ask
>>     for content in default layout.
>> 2. Proxy requests out to remote repositories utilize layouts to take
>>     an internal Artifact Reference, convert it to a path appropriate
>>     to the remote layout configuration and obtain the content that is
>>     desired.
>> 3. Simple Consumers utilize layouts to obtain File references, and
>>     Artifact References to the repository content for purposes of
>>     operating on the content in a way that they desire.
>> 4. Complex consumers (such as metadata updater, and snapshots purge)
>>     utilize layouts to obtain lists of versions and artifacts.
>>
>> What Works.
>>
>> * Converting an Artifact Reference to a path.
>> * Discovering Versions in a default layout.
>>    (needed by metadata update / snapshot purge)
>> * Converting a default layout path to an Artifact Reference correctly.
>>
>> What Doesn't Work.
>>
>> * Detecting the layout in use 100% of the time.
>> * Converting a legacy layout path to an Artifact Reference 100% of
>>    the time.
>> * Discovering versions in a legacy layout.
>>    (do we need metadata update / snapshot purge here?)
>> * Reporting problems correctly.
>>
>> :The Problem:
>>
>> The inability to parse useful information in a consistent way for all
>> provided paths.
>> Gleaning the following information from the path.
>>
>> * Layout Type (default / legacy)
>> * Group ID
>> * Artifact ID
>> * Version (Deployed version & Base version)
>> * Classifier (Not applicable in legacy layout)
>> * Type (Not the same as Extension)
>>
>> Example Paths: (included in this email for discussion, actual list
>>                from test cases)
>>
>> groupId/jars/-1.0.jar
>> org.apache.maven.test/jars/artifactId-1.0.war
>> ch.ethz.ganymed/jars/ganymed-ssh2-build210.jar
>> javax/jars/comm-3.0-u1.jar
>> javax.persistence/jars/ejb-3.0-public_review.jar
>> maven/jars/maven-test-plugin-1.8.2.jar
>> commons-lang/jars/commons-lang-2.1.jar
>> org.apache.derby/jars/derby-10.2.2.0.jar
>> com.foo/ejbs/foo-client-1.0.jar
>> com.foo.lib/javadoc.jars/foo-lib-2.1-alpha-1-javadoc.jar
>> com.foo.lib/java-sources/foo-lib-2.1-alpha-1-sources.jar
>> com.foo/jars/foo-tool-1.0.jar
>> org.apache.geronimo.specs/jars/geronimo-ejb_2.1_spec-1.0.1.jar
>> directory-clients/poms/ldap-clients-0.9.1-SNAPSHOT.pom
>> org.apache.archiva.test/jars/redonkulous-3.1-beta-1-20050831.101112-42.jar
>> invalid/invalid/1.0-20050611.123456-1/invalid-1.0-20050611.123456-1.jar
>> ch/ethz/ganymed/ganymed-ssh2/build210/ganymed-ssh2-build210.jar
>> javax/comm/3.0-u1/comm-3.0-u1.jar
>> javax/persistence/ejb/3.0-public_review/ejb-3.0-public_review.jar
>> maven/maven-test-plugin/1.8.2/maven-test-plugin-1.8.2.pom
>> test/maven-arch/test-arch/2.0.3-SNAPSHOT/test-arch-2.0.3-SNAPSHOT.pom
>> com/company/department/com.company.department/0.2/com.company.department-
>> 0.2.pom
>>
>> com/company/department/com.company.department.project/0.3/com.company.department.project-
>> 0.3.pom
>> com/foo/foo-tool/1.0/foo-tool-1.0.jar
>> commons-lang/commons-lang/2.1/commons-lang-2.1.jar
>> com/foo/foo-client/1.0/foo-client-1.0.jar
>> com/foo/lib/foo-lib/2.1-alpha-1/foo-lib-2.1-alpha-1-sources.jar
>> org/apache/archiva/test/redonkulous/3.1-beta-1-SNAPSHOT/redonkulous-
>> 3.1-beta-1-20050831.101112-42.jar
>>
>> :Proposal:
>>
>> The proposed logic for detecting layout.
>>
>> 1. Split path by directory seperators.
>> 2. If more than 3 parts ( dir/dir/dir/dir/filename ) == default layout.
>> 3. If less than 3 parts ( dir/filename ) == invalid path.
>> 4. If 3 parts ( dir/dir/filename ) then
>>     4.1. If part 2 name ends in "s" then test for potential legacy layout.
>>          4.1.1. Identify filename extension.
>>          4.1.2. Get potential list of artifact types for extension.
>>          4.1.3. If part 2 (minus the end "s")  is in the list of
>>                 artifact types == legacy layout
>>     4.2. Can't be legacy, then hand off to default layout.
>>
>> The problem with this approach is maintaining the list of extensions to
>> artifact type.  The artifact type is arbitrary, and can be expanded
>> upon by the user to include types that we can't even imagine today.
>> (See MRM-481: issue with extension .xml.zip)
>>
>> The proposed logic for parsing default layout paths.
>>
>> This one is easy.  paths are in the following format ...
>>
>>
>>
>> ${groupId}/${artifactId}/${baseVersion}/${artifactId}-${version}-${classifier}.${type}
>>
>> Once we seperate out the directories from the filename, we get the
>> following order.
>>
>> dirs[dirs.length] = base version.
>> dirs[dirs.length-1] = artifact Id.
>> dirs[0] thru dirs[dirs.length-2] = groupId.
>>
>> That gives us the crucial pieces in the filename
>> ${artifactId}-${version}, which makes detecting the classifier and
>> type easy enough.
>>
>> The proposed logic for parsing legacy layout paths.
>>
>> Legacy layouts are tricky.  It is nearly impossible to detect, using
>> the path alone, the correct artifactId or version.  So the process
>> will need to read the pom file associated with the artifact Id in order
>> to determine the correct Artifact Reference pieces.
>>
>> The problem with this approach is that we now need 2 pieces of
>> information, the repository root (location or url) and the path.
>> Plus we incur a hit / read of the pom file.
>>
>> So, if we use the pseudo-pattern ...
>> [:groupId:]/[:type:]s/[:filename:].[:ext:]
>> as a starting point, swap out the [:type:] and [:ext:] for "pom" and
>> load the pom from the actual repository to determine the groupId,
>> artifactId, and version, we can then have an valid Artifact Reference.
>>
>> The problem with relying on the pom is that it is now required for
>> legacy layout "from path" logic, this changes the assumption that poms
>> are optional and not required, as well as changing the interface
>> to the layout objects to needing a repository as well.
>>
>> The proposed changes to the codebase.
>>
>> * Eliminate RepositoryLayoutUtils, roll layout specific filename
>>    parsing routines into their respected layouts.
>> * Eliminate direct usage of BidirectionalRepositoryLayout by
>>    consumers.
>> * Create RepositoryContentRequest that takes the freeform requests
>>    arriving in from the "/repository/" urls and puts it through
>>    the logic as outlined above.
>> * Rename BidirectionalRepositoryLayout interface to RepositoryContent
>>    to simplify name and represent new role of accessing repository
>>    content that requires a repository reference.
>> * Create DefaultRepositoryContent and LegacyRepositoryContent
>>    implementations, that utilize techniques described above, and
>>    logic already present in DefaultBidirectionalRepositoryLayout and
>>    LegacayBidirectionalRepositoryLayout.
>> * Create AnonymousProjectReader that takes a File object pointing to
>>    a pom, read the <pomVersion> or <modelVersion> elements and load
>>    the pom information as appropriate.
>> * Create RepositoryContentFactory that returns a RepositoryContent
>>    implementation for the provided repository id.
>>
>> Example of new RepositoryContent interface.
>>
>> --(snip)--
>> package org.apache.maven.archiva.repository;
>>
>> /*
>> * Licensed to the Apache Software Foundation (ASF) under one
>> * or more contributor license agreements.  See the NOTICE file
>> * distributed with this work for additional information
>> * regarding copyright ownership.  The ASF licenses this file
>> * to you under the Apache License, Version 2.0 (the
>> * "License"); you may not use this file except in compliance
>> * with the License.  You may obtain a copy of the License at
>> *
>> *  http://www.apache.org/licenses/LICENSE-2.0
>> *
>> * Unless required by applicable law or agreed to in writing,
>> * software distributed under the License is distributed on an
>> * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
>> * KIND, either express or implied.  See the License for the
>> * specific language governing permissions and limitations
>> * under the License.
>> */
>>
>> import org.apache.maven.archiva.model.ArtifactReference;
>> import org.apache.maven.archiva.model.ProjectReference;
>> import org.apache.maven.archiva.model.VersionedReference;
>> import org.apache.maven.archiva.repository.layout.LayoutException;
>>
>> import java.util.List;
>>
>> /**
>> * RepositoryContent interface for interacting with a managed repository
>> * in an abstract way, without the need for processing based on
>> * filesystem paths, or working with the database.
>> *
>> * @author <a href="mailto:joakim@erdfelt.com">Joakim Erdfelt</a>
>> * @version $Id$
>> */
>> public interface RepositoryContent
>> {
>>    /**
>>     * Determines if the project referenced exists in the repository.
>>     *
>>     * @param reference the project reference to check for.
>>     * @return true it the project referenced exists.
>>     */
>>    public boolean hasContent( ProjectReference reference );
>>
>>    /**
>>     * Determines if the version reference exists in the repository.
>>     *
>>     * @param reference the version reference to check for.
>>     * @return true if the version referenced exists.
>>     */
>>    public boolean hasContent( VersionedReference reference );
>>
>>    /**
>>     * Determines if the artifact referenced exists in the repository.
>>     *
>>     * @param reference the artifact reference to check for.
>>     * @return true if the artifact referenced exists.
>>     */
>>    public boolean hasContent( ArtifactReference reference );
>>
>>    /**
>>     * Given a repository relative path to a filename, return the
>>     * {@link VersionedReference} object suitable for the path.
>>     *
>>     * @param path the path relative to the repository base dir for
>>     *        the artifact.
>>     * @return the {@link ArtifactReference} representing the path.
>>     *        (or null if path cannot be converted to a
>>     *        {@link ArtifactReference})
>>     * @throws LayoutException if there was a problem converting the
>>     *         path to an artifact.
>>     */
>>    public ArtifactReference toArtifactReference( String path );
>>
>>    /**
>>     * Given an ArtifactReference, return the relative path to the
>>     * artifact.
>>     *
>>     * @param reference the artifact reference to use.
>>     * @return the relative path to the artifact.
>>     */
>>    public String toPath( ArtifactReference reference );
>>
>>    /**
>>     * Given an ArtifactReference, return the file reference to the
>>     * artifact.
>>     *
>>     * @param reference the artifact reference to use.
>>     * @return the relative path to the artifact.
>>     */
>>    public File toFile( ArtifactReference reference );
>>
>>    /**
>>     * Given an ArtifactReference, return the url to the artifact.
>>     *
>>     * @param reference the artifact reference to use.
>>     * @return the relative path to the artifact.
>>     */
>>    public URL toURL( ArtifactReference reference );
>>
>>
>>    /**
>>     * Gather up the list of related artifacts to the ArtifactReference
>>     * provided. This typically inclues the pom files, and those things
>>     * with classifiers (such as doc, source code, test libs, etc...)
>>     *
>>     * NOTE: Some layouts (such as maven 1 "legacy"), and remote
>>     * repositories are not compatible with this query.
>>     *
>>     * @param reference the reference to work off of.
>>     * @return the list of ArtifactReferences for related artifacts.
>>     * @throws ContentNotFoundException if the initial artifact reference
>>     *         does not exist within the repository.
>>     */
>>    public List<ArtifactReference> getRelatedArtifacts(
>>        ArtifactReference reference )
>>        throws ContentNotFoundException, NotSupportedException;
>>
>>    /**
>>     * Given a specific VersionedReference, return the list of available
>>     * versions for that versioned reference.
>>     *
>>     * NOTE: This is really only useful when working with SNAPSHOTs.
>>     *       Not compatible with remote repositories.
>>     *
>>     * @param reference the versioned reference to work off of.
>>     * @return the list of versions found.
>>     * @throws ContentNotFoundException if the versioned reference does
>>     *         not exist within the repository.
>>     */
>>    public List<String> getVersions( VersionedReference reference )
>>        throws ContentNotFoundException, NotSupportedException;
>>
>>    /**
>>     * Given a specific ProjectReference, return the list of available
>>     * versions for that project reference.
>>     *
>>     * @param reference the project reference to work off of.
>>     * @return the list of versions found for that project reference.
>>     * @throws ContentNotFoundException if the project reference does not
>>     *         exist within the repository.
>>     */
>>    public List<String> getVersions( ProjectReference reference )
>>        throws ContentNotFoundException, NotSupportedException;
>> }
>> --(snip)--
>>
>> I feel this is a better long term solution for the persistent layout
>> parsing issues we have within Archiva.  However not all of the problems
>> have been solved.  I've outlined the ones that need help above in this
>> email, but I'm sure there are ones that have been overlooked.
>>
>> Disclaimer: Yes, this is in the form of a proposal, but I'm already
>> working on this, and will continue down this path unless
>> someone here has a strong objection about this approach.
>>
>> --
>> - Joakim Erdfelt
>> Committer and PMC Member, Apache Maven
>> Archiva Developer
>> joakime@apache.org
>>
>>
>>     
>
>   


-- 
- Joakim Erdfelt
  joakim@erdfelt.com
  Open Source Software (OSS) Developer


Mime
View raw message