archiva-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joakim Erdfelt <joa...@erdfelt.com>
Subject Re: [Proposal] Repository Layout Detection/Interaction Changes.
Date Tue, 09 Oct 2007 14:30:03 GMT
I think you are missing the core point of the proposal.

(Is nicolas the only one that understands the difficulty?)

Using *just* the path information, how do you get from a maven 1 request 
to an artifact reference?
(groupId, artifactId, version, type)

ch.ethz.ganymed/jars/ganymed-ssh2-build210.jar
maven/jars/maven-test-plugin-1.8.2.jar
org.apache.geronimo.specs/jars/geronimo-ejb_2.1_spec-1.0.1.jar

These are some examples of the problems that arise.
Using that magic regex that you mentioned (which we use in archiva too!) 
we get a 1 to many split from path to reference.

(using "|" syntax for examples of references below 
groupId|artifactId|version|type)

ch.ethz.ganymed/jars/ganymed-ssh2-build210.jar
   becomes one of the following
ch.ethz.ganymed|ganymed|ssh2-build210|jar
ch.ethz.ganymed|ganymed-ssh2|build210|jar

maven/jars/maven-test-plugin-1.8.2.jar
   becomes one of the following
maven|maven|test-plugin-1.8.2|jar
maven|maven-test-plugin|1.8.2|jar

org.apache.geronimo.specs/jars/geronimo-ejb_2.1_spec-1.0.1.jar
   becomes one of the following
org.apache.geronimo.specs|geronimo|ejb_2.1_spec-1.0.1|jar
org.apache.geronimo.specs|geronimo-ejb|2.1_spec-1.0.1|jar
org.apache.geronimo.specs|geronimo-ejb_2.1|spec-1.0.1|jar
org.apache.geronimo.specs|geronimo-ejb_2.1_spec|1.0.1|jar

The process to get from maven 1 legacy request to a reference is not 
possible 100% of the time.
Any for the record, there is no code in maven 1 or maven 2 that does 
this (take a path and get an artifact reference), I've looked dozens of 
times, it just doesn't exist.

The initial proposal was to eliminate this 1 to many problem by reading 
the pom file for the information regarding the groupId / artifactId / 
version, but that isn't a valid solution when working with content that 
needs to be proxied.

- Joakim

Brett Porter wrote:
> It took me a while to digest, but I think this is being over-thought.
>
> I have some questions I need answering before I fully understand:
> - "Discovering versions in a legacy layout. (do we need metadata 
> update / snapshot purge here?)" -- not sure if it's what you meant, 
> but no I don't think we need these so does that mean the problem isn't 
> relevant?
> - What are the reporting problems you refer to?
> - why is classifier not applicable in legacy?
>
> For the proposal, it looks ok (with the adjustments Nicolas made), but 
> it also looks like what the code already does?
>
> Some more things, just in point form...
>
> Since it's problematic, why enumerate the artifact types? In m1, we 
> can determine this purely on the filename extension so it's easy to 
> detect.
>
> " It is nearly impossible to detect, using the path alone, the correct 
> artifactId or version" -- to address this in the regexes for the 
> central repository we have a regex and some special cases. I think 
> that might be suitable in this case (even if they have to be hand 
> configured). The ones we have already determined should take care of 
> central. I don't think you can rely on reading the POM - it may not 
> exist, especially if you are mapping to another m1 repository, as you 
> said. I wold be against doing this - it's probably what is causing 
> most of the over-complication I see here.
>
> I think the last part is key, as if we aren't re-reading the POM, I 
> believe the code changes you discussed, and the 2 use cases in your 
> second mail are irrelevant, is that right?
>
> Just to wrap it up, you are correct about the first use case in your 
> second mail - maven-metadata.xml requests are not in the legacy layout.
>
> Cheers,
> Brett
>
>
> On 05/10/2007, at 11:58 PM, Joakim Erdfelt wrote:
>
>>
>> This is a long email, read it all before commenting, and you'll likely
>> see a response to your earlier questions. :-)
>>
>> I'm currently working on MRM-432 and MRM-519, and I'm in the middle 
>> of an
>> important change to how Archiva handles Layout detection, interaction,
>> and parsing.
>>
>> :Background:
>>
>> Layouts in Archiva have 2 main purposes.
>>
>>  1. to convert a path to an artifact reference.
>>  2. to convert an artifact reference to a path.
>>
>> Layouts are used by the following.
>>
>>  1. The "/repository/${repoid}/" urls use layouts to determine the
>>     Artifact Reference that the client is requesting.
>>     The "/repository/" url is layout neutral, and can have maven 1
>>     clients ask for content in legacy format, or maven 2 clients ask
>>     for content in default layout.
>>  2. Proxy requests out to remote repositories utilize layouts to take
>>     an internal Artifact Reference, convert it to a path appropriate
>>     to the remote layout configuration and obtain the content that is
>>     desired.
>>  3. Simple Consumers utilize layouts to obtain File references, and
>>     Artifact References to the repository content for purposes of
>>     operating on the content in a way that they desire.
>>  4. Complex consumers (such as metadata updater, and snapshots purge)
>>     utilize layouts to obtain lists of versions and artifacts.
>>
>> What Works.
>>
>>  * Converting an Artifact Reference to a path.
>>  * Discovering Versions in a default layout.
>>    (needed by metadata update / snapshot purge)
>>  * Converting a default layout path to an Artifact Reference correctly.
>>
>> What Doesn't Work.
>>
>>  * Detecting the layout in use 100% of the time.
>>  * Converting a legacy layout path to an Artifact Reference 100% of
>>    the time.
>>  * Discovering versions in a legacy layout.
>>    (do we need metadata update / snapshot purge here?)
>>  * Reporting problems correctly.
>>
>> :The Problem:
>>
>> The inability to parse useful information in a consistent way for all
>> provided paths.
>> Gleaning the following information from the path.
>>
>>  * Layout Type (default / legacy)
>>  * Group ID
>>  * Artifact ID
>>  * Version (Deployed version & Base version)
>>  * Classifier (Not applicable in legacy layout)
>>  * Type (Not the same as Extension)
>>
>> Example Paths: (included in this email for discussion, actual list
>>                from test cases)
>>
>> groupId/jars/-1.0.jar
>> org.apache.maven.test/jars/artifactId-1.0.war
>> ch.ethz.ganymed/jars/ganymed-ssh2-build210.jar
>> javax/jars/comm-3.0-u1.jar
>> javax.persistence/jars/ejb-3.0-public_review.jar
>> maven/jars/maven-test-plugin-1.8.2.jar
>> commons-lang/jars/commons-lang-2.1.jar
>> org.apache.derby/jars/derby-10.2.2.0.jar
>> com.foo/ejbs/foo-client-1.0.jar
>> com.foo.lib/javadoc.jars/foo-lib-2.1-alpha-1-javadoc.jar
>> com.foo.lib/java-sources/foo-lib-2.1-alpha-1-sources.jar
>> com.foo/jars/foo-tool-1.0.jar
>> org.apache.geronimo.specs/jars/geronimo-ejb_2.1_spec-1.0.1.jar
>> directory-clients/poms/ldap-clients-0.9.1-SNAPSHOT.pom
>> org.apache.archiva.test/jars/redonkulous-3.1-beta-1-20050831.101112-42.jar 
>>
>> invalid/invalid/1.0-20050611.123456-1/invalid-1.0-20050611.123456-1.jar
>> ch/ethz/ganymed/ganymed-ssh2/build210/ganymed-ssh2-build210.jar
>> javax/comm/3.0-u1/comm-3.0-u1.jar
>> javax/persistence/ejb/3.0-public_review/ejb-3.0-public_review.jar
>> maven/maven-test-plugin/1.8.2/maven-test-plugin-1.8.2.pom
>> test/maven-arch/test-arch/2.0.3-SNAPSHOT/test-arch-2.0.3-SNAPSHOT.pom
>> com/company/department/com.company.department/0.2/com.company.department-0.2.pom

>>
>> com/company/department/com.company.department.project/0.3/com.company.department.project-0.3.pom

>>
>> com/foo/foo-tool/1.0/foo-tool-1.0.jar
>> commons-lang/commons-lang/2.1/commons-lang-2.1.jar
>> com/foo/foo-client/1.0/foo-client-1.0.jar
>> com/foo/lib/foo-lib/2.1-alpha-1/foo-lib-2.1-alpha-1-sources.jar
>> org/apache/archiva/test/redonkulous/3.1-beta-1-SNAPSHOT/redonkulous-3.1-beta-1-20050831.101112-42.jar

>>
>>
>> :Proposal:
>>
>> The proposed logic for detecting layout.
>>
>>  1. Split path by directory seperators.
>>  2. If more than 3 parts ( dir/dir/dir/dir/filename ) == default layout.
>>  3. If less than 3 parts ( dir/filename ) == invalid path.
>>  4. If 3 parts ( dir/dir/filename ) then
>>     4.1. If part 2 name ends in "s" then test for potential legacy 
>> layout.
>>          4.1.1. Identify filename extension.
>>          4.1.2. Get potential list of artifact types for extension.
>>          4.1.3. If part 2 (minus the end "s")  is in the list of
>>                 artifact types == legacy layout
>>     4.2. Can't be legacy, then hand off to default layout.
>>
>>  The problem with this approach is maintaining the list of extensions to
>>  artifact type.  The artifact type is arbitrary, and can be expanded
>>  upon by the user to include types that we can't even imagine today.
>>  (See MRM-481: issue with extension .xml.zip)
>>
>> The proposed logic for parsing default layout paths.
>>
>>  This one is easy.  paths are in the following format ...
>>
>>  ${groupId}/${artifactId}/${baseVersion}/${artifactId}-${version}-${classifier}.${type}

>>
>>
>>  Once we seperate out the directories from the filename, we get the
>>  following order.
>>
>>  dirs[dirs.length] = base version.
>>  dirs[dirs.length-1] = artifact Id.
>>  dirs[0] thru dirs[dirs.length-2] = groupId.
>>
>>  That gives us the crucial pieces in the filename
>>  ${artifactId}-${version}, which makes detecting the classifier and
>>  type easy enough.
>>
>> The proposed logic for parsing legacy layout paths.
>>
>>  Legacy layouts are tricky.  It is nearly impossible to detect, using
>>  the path alone, the correct artifactId or version.  So the process
>>  will need to read the pom file associated with the artifact Id in order
>>  to determine the correct Artifact Reference pieces.
>>
>>  The problem with this approach is that we now need 2 pieces of
>>  information, the repository root (location or url) and the path.
>>  Plus we incur a hit / read of the pom file.
>>
>>  So, if we use the pseudo-pattern ...
>>  [:groupId:]/[:type:]s/[:filename:].[:ext:]
>>  as a starting point, swap out the [:type:] and [:ext:] for "pom" and
>>  load the pom from the actual repository to determine the groupId,
>>  artifactId, and version, we can then have an valid Artifact Reference.
>>
>>  The problem with relying on the pom is that it is now required for
>>  legacy layout "from path" logic, this changes the assumption that poms
>>  are optional and not required, as well as changing the interface
>>  to the layout objects to needing a repository as well.
>>
>> The proposed changes to the codebase.
>>
>>  * Eliminate RepositoryLayoutUtils, roll layout specific filename
>>    parsing routines into their respected layouts.
>>  * Eliminate direct usage of BidirectionalRepositoryLayout by
>>    consumers.
>>  * Create RepositoryContentRequest that takes the freeform requests
>>    arriving in from the "/repository/" urls and puts it through
>>    the logic as outlined above.
>>  * Rename BidirectionalRepositoryLayout interface to RepositoryContent
>>    to simplify name and represent new role of accessing repository
>>    content that requires a repository reference.
>>  * Create DefaultRepositoryContent and LegacyRepositoryContent
>>    implementations, that utilize techniques described above, and
>>    logic already present in DefaultBidirectionalRepositoryLayout and
>>    LegacayBidirectionalRepositoryLayout.
>>  * Create AnonymousProjectReader that takes a File object pointing to
>>    a pom, read the <pomVersion> or <modelVersion> elements and load
>>    the pom information as appropriate.
>>  * Create RepositoryContentFactory that returns a RepositoryContent
>>    implementation for the provided repository id.
>>
>> Example of new RepositoryContent interface.
>>
>> --(snip)--
>> package org.apache.maven.archiva.repository;
>>
>> /*
>> * Licensed to the Apache Software Foundation (ASF) under one
>> * or more contributor license agreements.  See the NOTICE file
>> * distributed with this work for additional information
>> * regarding copyright ownership.  The ASF licenses this file
>> * to you under the Apache License, Version 2.0 (the
>> * "License"); you may not use this file except in compliance
>> * with the License.  You may obtain a copy of the License at
>> *
>> *  http://www.apache.org/licenses/LICENSE-2.0
>> *
>> * Unless required by applicable law or agreed to in writing,
>> * software distributed under the License is distributed on an
>> * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
>> * KIND, either express or implied.  See the License for the
>> * specific language governing permissions and limitations
>> * under the License.
>> */
>>
>> import org.apache.maven.archiva.model.ArtifactReference;
>> import org.apache.maven.archiva.model.ProjectReference;
>> import org.apache.maven.archiva.model.VersionedReference;
>> import org.apache.maven.archiva.repository.layout.LayoutException;
>>
>> import java.util.List;
>>
>> /**
>> * RepositoryContent interface for interacting with a managed repository
>> * in an abstract way, without the need for processing based on
>> * filesystem paths, or working with the database.
>> *
>> * @author <a href="mailto:joakim@erdfelt.com">Joakim Erdfelt</a>
>> * @version $Id$
>> */
>> public interface RepositoryContent
>> {
>>    /**
>>     * Determines if the project referenced exists in the repository.
>>     *
>>     * @param reference the project reference to check for.
>>     * @return true it the project referenced exists.
>>     */
>>    public boolean hasContent( ProjectReference reference );
>>
>>    /**
>>     * Determines if the version reference exists in the repository.
>>     *
>>     * @param reference the version reference to check for.
>>     * @return true if the version referenced exists.
>>     */
>>    public boolean hasContent( VersionedReference reference );
>>
>>    /**
>>     * Determines if the artifact referenced exists in the repository.
>>     *
>>     * @param reference the artifact reference to check for.
>>     * @return true if the artifact referenced exists.
>>     */
>>    public boolean hasContent( ArtifactReference reference );
>>
>>    /**
>>     * Given a repository relative path to a filename, return the
>>     * {@link VersionedReference} object suitable for the path.
>>     *
>>     * @param path the path relative to the repository base dir for
>>     *        the artifact.
>>     * @return the {@link ArtifactReference} representing the path.
>>     *        (or null if path cannot be converted to a
>>     *        {@link ArtifactReference})
>>     * @throws LayoutException if there was a problem converting the
>>     *         path to an artifact.
>>     */
>>    public ArtifactReference toArtifactReference( String path );
>>
>>    /**
>>     * Given an ArtifactReference, return the relative path to the
>>     * artifact.
>>     *
>>     * @param reference the artifact reference to use.
>>     * @return the relative path to the artifact.
>>     */
>>    public String toPath( ArtifactReference reference );
>>
>>    /**
>>     * Given an ArtifactReference, return the file reference to the
>>     * artifact.
>>     *
>>     * @param reference the artifact reference to use.
>>     * @return the relative path to the artifact.
>>     */
>>    public File toFile( ArtifactReference reference );
>>
>>    /**
>>     * Given an ArtifactReference, return the url to the artifact.
>>     *
>>     * @param reference the artifact reference to use.
>>     * @return the relative path to the artifact.
>>     */
>>    public URL toURL( ArtifactReference reference );
>>
>>
>>    /**
>>     * Gather up the list of related artifacts to the ArtifactReference
>>     * provided. This typically inclues the pom files, and those things
>>     * with classifiers (such as doc, source code, test libs, etc...)
>>     *
>>     * NOTE: Some layouts (such as maven 1 "legacy"), and remote
>>     * repositories are not compatible with this query.
>>     *
>>     * @param reference the reference to work off of.
>>     * @return the list of ArtifactReferences for related artifacts.
>>     * @throws ContentNotFoundException if the initial artifact reference
>>     *         does not exist within the repository.
>>     */
>>    public List<ArtifactReference> getRelatedArtifacts(
>>        ArtifactReference reference )
>>        throws ContentNotFoundException, NotSupportedException;
>>
>>    /**
>>     * Given a specific VersionedReference, return the list of available
>>     * versions for that versioned reference.
>>     *
>>     * NOTE: This is really only useful when working with SNAPSHOTs.
>>     *       Not compatible with remote repositories.
>>     *
>>     * @param reference the versioned reference to work off of.
>>     * @return the list of versions found.
>>     * @throws ContentNotFoundException if the versioned reference does
>>     *         not exist within the repository.
>>     */
>>    public List<String> getVersions( VersionedReference reference )
>>        throws ContentNotFoundException, NotSupportedException;
>>
>>    /**
>>     * Given a specific ProjectReference, return the list of available
>>     * versions for that project reference.
>>     *
>>     * @param reference the project reference to work off of.
>>     * @return the list of versions found for that project reference.
>>     * @throws ContentNotFoundException if the project reference does not
>>     *         exist within the repository.
>>     */
>>    public List<String> getVersions( ProjectReference reference )
>>        throws ContentNotFoundException, NotSupportedException;
>> }
>> --(snip)--
>>
>> I feel this is a better long term solution for the persistent layout
>> parsing issues we have within Archiva.  However not all of the problems
>> have been solved.  I've outlined the ones that need help above in this
>> email, but I'm sure there are ones that have been overlooked.
>>
>> Disclaimer: Yes, this is in the form of a proposal, but I'm already
>> working on this, and will continue down this path unless
>> someone here has a strong objection about this approach.
>>
>> -- 
>> - Joakim Erdfelt
>>  Committer and PMC Member, Apache Maven
>>  Archiva Developer
>>  joakime@apache.org
>
> -- 
> Brett Porter - brett@apache.org
> Blog: http://www.devzuz.org/blogs/bporter/
>


-- 
- Joakim Erdfelt
  joakim@erdfelt.com
  Open Source Software (OSS) Developer


Mime
View raw message