archiva-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brett Porter <br...@apache.org>
Subject Re: [Proposal] Repository Layout Detection/Interaction Changes.
Date Tue, 09 Oct 2007 12:41:40 GMT
It took me a while to digest, but I think this is being over-thought.

I have some questions I need answering before I fully understand:
- "Discovering versions in a legacy layout. (do we need metadata  
update / snapshot purge here?)" -- not sure if it's what you meant,  
but no I don't think we need these so does that mean the problem  
isn't relevant?
- What are the reporting problems you refer to?
- why is classifier not applicable in legacy?

For the proposal, it looks ok (with the adjustments Nicolas made),  
but it also looks like what the code already does?

Some more things, just in point form...

Since it's problematic, why enumerate the artifact types? In m1, we  
can determine this purely on the filename extension so it's easy to  
detect.

" It is nearly impossible to detect, using the path alone, the  
correct artifactId or version" -- to address this in the regexes for  
the central repository we have a regex and some special cases. I  
think that might be suitable in this case (even if they have to be  
hand configured). The ones we have already determined should take  
care of central. I don't think you can rely on reading the POM - it  
may not exist, especially if you are mapping to another m1  
repository, as you said. I wold be against doing this - it's probably  
what is causing most of the over-complication I see here.

I think the last part is key, as if we aren't re-reading the POM, I  
believe the code changes you discussed, and the 2 use cases in your  
second mail are irrelevant, is that right?

Just to wrap it up, you are correct about the first use case in your  
second mail - maven-metadata.xml requests are not in the legacy layout.

Cheers,
Brett


On 05/10/2007, at 11:58 PM, Joakim Erdfelt wrote:

>
> This is a long email, read it all before commenting, and you'll likely
> see a response to your earlier questions. :-)
>
> I'm currently working on MRM-432 and MRM-519, and I'm in the middle  
> of an
> important change to how Archiva handles Layout detection, interaction,
> and parsing.
>
> :Background:
>
> Layouts in Archiva have 2 main purposes.
>
>  1. to convert a path to an artifact reference.
>  2. to convert an artifact reference to a path.
>
> Layouts are used by the following.
>
>  1. The "/repository/${repoid}/" urls use layouts to determine the
>     Artifact Reference that the client is requesting.
>     The "/repository/" url is layout neutral, and can have maven 1
>     clients ask for content in legacy format, or maven 2 clients ask
>     for content in default layout.
>  2. Proxy requests out to remote repositories utilize layouts to take
>     an internal Artifact Reference, convert it to a path appropriate
>     to the remote layout configuration and obtain the content that is
>     desired.
>  3. Simple Consumers utilize layouts to obtain File references, and
>     Artifact References to the repository content for purposes of
>     operating on the content in a way that they desire.
>  4. Complex consumers (such as metadata updater, and snapshots purge)
>     utilize layouts to obtain lists of versions and artifacts.
>
> What Works.
>
>  * Converting an Artifact Reference to a path.
>  * Discovering Versions in a default layout.
>    (needed by metadata update / snapshot purge)
>  * Converting a default layout path to an Artifact Reference  
> correctly.
>
> What Doesn't Work.
>
>  * Detecting the layout in use 100% of the time.
>  * Converting a legacy layout path to an Artifact Reference 100% of
>    the time.
>  * Discovering versions in a legacy layout.
>    (do we need metadata update / snapshot purge here?)
>  * Reporting problems correctly.
>
> :The Problem:
>
> The inability to parse useful information in a consistent way for all
> provided paths.
> Gleaning the following information from the path.
>
>  * Layout Type (default / legacy)
>  * Group ID
>  * Artifact ID
>  * Version (Deployed version & Base version)
>  * Classifier (Not applicable in legacy layout)
>  * Type (Not the same as Extension)
>
> Example Paths: (included in this email for discussion, actual list
>                from test cases)
>
> groupId/jars/-1.0.jar
> org.apache.maven.test/jars/artifactId-1.0.war
> ch.ethz.ganymed/jars/ganymed-ssh2-build210.jar
> javax/jars/comm-3.0-u1.jar
> javax.persistence/jars/ejb-3.0-public_review.jar
> maven/jars/maven-test-plugin-1.8.2.jar
> commons-lang/jars/commons-lang-2.1.jar
> org.apache.derby/jars/derby-10.2.2.0.jar
> com.foo/ejbs/foo-client-1.0.jar
> com.foo.lib/javadoc.jars/foo-lib-2.1-alpha-1-javadoc.jar
> com.foo.lib/java-sources/foo-lib-2.1-alpha-1-sources.jar
> com.foo/jars/foo-tool-1.0.jar
> org.apache.geronimo.specs/jars/geronimo-ejb_2.1_spec-1.0.1.jar
> directory-clients/poms/ldap-clients-0.9.1-SNAPSHOT.pom
> org.apache.archiva.test/jars/redonkulous-3.1- 
> beta-1-20050831.101112-42.jar
> invalid/invalid/1.0-20050611.123456-1/ 
> invalid-1.0-20050611.123456-1.jar
> ch/ethz/ganymed/ganymed-ssh2/build210/ganymed-ssh2-build210.jar
> javax/comm/3.0-u1/comm-3.0-u1.jar
> javax/persistence/ejb/3.0-public_review/ejb-3.0-public_review.jar
> maven/maven-test-plugin/1.8.2/maven-test-plugin-1.8.2.pom
> test/maven-arch/test-arch/2.0.3-SNAPSHOT/test-arch-2.0.3-SNAPSHOT.pom
> com/company/department/com.company.department/0.2/ 
> com.company.department-0.2.pom
> com/company/department/com.company.department.project/0.3/ 
> com.company.department.project-0.3.pom
> com/foo/foo-tool/1.0/foo-tool-1.0.jar
> commons-lang/commons-lang/2.1/commons-lang-2.1.jar
> com/foo/foo-client/1.0/foo-client-1.0.jar
> com/foo/lib/foo-lib/2.1-alpha-1/foo-lib-2.1-alpha-1-sources.jar
> org/apache/archiva/test/redonkulous/3.1-beta-1-SNAPSHOT/ 
> redonkulous-3.1-beta-1-20050831.101112-42.jar
>
> :Proposal:
>
> The proposed logic for detecting layout.
>
>  1. Split path by directory seperators.
>  2. If more than 3 parts ( dir/dir/dir/dir/filename ) == default  
> layout.
>  3. If less than 3 parts ( dir/filename ) == invalid path.
>  4. If 3 parts ( dir/dir/filename ) then
>     4.1. If part 2 name ends in "s" then test for potential legacy  
> layout.
>          4.1.1. Identify filename extension.
>          4.1.2. Get potential list of artifact types for extension.
>          4.1.3. If part 2 (minus the end "s")  is in the list of
>                 artifact types == legacy layout
>     4.2. Can't be legacy, then hand off to default layout.
>
>  The problem with this approach is maintaining the list of  
> extensions to
>  artifact type.  The artifact type is arbitrary, and can be expanded
>  upon by the user to include types that we can't even imagine today.
>  (See MRM-481: issue with extension .xml.zip)
>
> The proposed logic for parsing default layout paths.
>
>  This one is easy.  paths are in the following format ...
>
>  ${groupId}/${artifactId}/${baseVersion}/${artifactId}-${version}-$ 
> {classifier}.${type}
>
>  Once we seperate out the directories from the filename, we get the
>  following order.
>
>  dirs[dirs.length] = base version.
>  dirs[dirs.length-1] = artifact Id.
>  dirs[0] thru dirs[dirs.length-2] = groupId.
>
>  That gives us the crucial pieces in the filename
>  ${artifactId}-${version}, which makes detecting the classifier and
>  type easy enough.
>
> The proposed logic for parsing legacy layout paths.
>
>  Legacy layouts are tricky.  It is nearly impossible to detect, using
>  the path alone, the correct artifactId or version.  So the process
>  will need to read the pom file associated with the artifact Id in  
> order
>  to determine the correct Artifact Reference pieces.
>
>  The problem with this approach is that we now need 2 pieces of
>  information, the repository root (location or url) and the path.
>  Plus we incur a hit / read of the pom file.
>
>  So, if we use the pseudo-pattern ...
>  [:groupId:]/[:type:]s/[:filename:].[:ext:]
>  as a starting point, swap out the [:type:] and [:ext:] for "pom" and
>  load the pom from the actual repository to determine the groupId,
>  artifactId, and version, we can then have an valid Artifact  
> Reference.
>
>  The problem with relying on the pom is that it is now required for
>  legacy layout "from path" logic, this changes the assumption that  
> poms
>  are optional and not required, as well as changing the interface
>  to the layout objects to needing a repository as well.
>
> The proposed changes to the codebase.
>
>  * Eliminate RepositoryLayoutUtils, roll layout specific filename
>    parsing routines into their respected layouts.
>  * Eliminate direct usage of BidirectionalRepositoryLayout by
>    consumers.
>  * Create RepositoryContentRequest that takes the freeform requests
>    arriving in from the "/repository/" urls and puts it through
>    the logic as outlined above.
>  * Rename BidirectionalRepositoryLayout interface to RepositoryContent
>    to simplify name and represent new role of accessing repository
>    content that requires a repository reference.
>  * Create DefaultRepositoryContent and LegacyRepositoryContent
>    implementations, that utilize techniques described above, and
>    logic already present in DefaultBidirectionalRepositoryLayout and
>    LegacayBidirectionalRepositoryLayout.
>  * Create AnonymousProjectReader that takes a File object pointing to
>    a pom, read the <pomVersion> or <modelVersion> elements and load
>    the pom information as appropriate.
>  * Create RepositoryContentFactory that returns a RepositoryContent
>    implementation for the provided repository id.
>
> Example of new RepositoryContent interface.
>
> --(snip)--
> package org.apache.maven.archiva.repository;
>
> /*
> * Licensed to the Apache Software Foundation (ASF) under one
> * or more contributor license agreements.  See the NOTICE file
> * distributed with this work for additional information
> * regarding copyright ownership.  The ASF licenses this file
> * to you under the Apache License, Version 2.0 (the
> * "License"); you may not use this file except in compliance
> * with the License.  You may obtain a copy of the License at
> *
> *  http://www.apache.org/licenses/LICENSE-2.0
> *
> * Unless required by applicable law or agreed to in writing,
> * software distributed under the License is distributed on an
> * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
> * KIND, either express or implied.  See the License for the
> * specific language governing permissions and limitations
> * under the License.
> */
>
> import org.apache.maven.archiva.model.ArtifactReference;
> import org.apache.maven.archiva.model.ProjectReference;
> import org.apache.maven.archiva.model.VersionedReference;
> import org.apache.maven.archiva.repository.layout.LayoutException;
>
> import java.util.List;
>
> /**
> * RepositoryContent interface for interacting with a managed  
> repository
> * in an abstract way, without the need for processing based on
> * filesystem paths, or working with the database.
> *
> * @author <a href="mailto:joakim@erdfelt.com">Joakim Erdfelt</a>
> * @version $Id$
> */
> public interface RepositoryContent
> {
>    /**
>     * Determines if the project referenced exists in the repository.
>     *
>     * @param reference the project reference to check for.
>     * @return true it the project referenced exists.
>     */
>    public boolean hasContent( ProjectReference reference );
>
>    /**
>     * Determines if the version reference exists in the repository.
>     *
>     * @param reference the version reference to check for.
>     * @return true if the version referenced exists.
>     */
>    public boolean hasContent( VersionedReference reference );
>
>    /**
>     * Determines if the artifact referenced exists in the repository.
>     *
>     * @param reference the artifact reference to check for.
>     * @return true if the artifact referenced exists.
>     */
>    public boolean hasContent( ArtifactReference reference );
>
>    /**
>     * Given a repository relative path to a filename, return the
>     * {@link VersionedReference} object suitable for the path.
>     *
>     * @param path the path relative to the repository base dir for
>     *        the artifact.
>     * @return the {@link ArtifactReference} representing the path.
>     *        (or null if path cannot be converted to a
>     *        {@link ArtifactReference})
>     * @throws LayoutException if there was a problem converting the
>     *         path to an artifact.
>     */
>    public ArtifactReference toArtifactReference( String path );
>
>    /**
>     * Given an ArtifactReference, return the relative path to the
>     * artifact.
>     *
>     * @param reference the artifact reference to use.
>     * @return the relative path to the artifact.
>     */
>    public String toPath( ArtifactReference reference );
>
>    /**
>     * Given an ArtifactReference, return the file reference to the
>     * artifact.
>     *
>     * @param reference the artifact reference to use.
>     * @return the relative path to the artifact.
>     */
>    public File toFile( ArtifactReference reference );
>
>    /**
>     * Given an ArtifactReference, return the url to the artifact.
>     *
>     * @param reference the artifact reference to use.
>     * @return the relative path to the artifact.
>     */
>    public URL toURL( ArtifactReference reference );
>
>
>    /**
>     * Gather up the list of related artifacts to the ArtifactReference
>     * provided. This typically inclues the pom files, and those things
>     * with classifiers (such as doc, source code, test libs, etc...)
>     *
>     * NOTE: Some layouts (such as maven 1 "legacy"), and remote
>     * repositories are not compatible with this query.
>     *
>     * @param reference the reference to work off of.
>     * @return the list of ArtifactReferences for related artifacts.
>     * @throws ContentNotFoundException if the initial artifact  
> reference
>     *         does not exist within the repository.
>     */
>    public List<ArtifactReference> getRelatedArtifacts(
>        ArtifactReference reference )
>        throws ContentNotFoundException, NotSupportedException;
>
>    /**
>     * Given a specific VersionedReference, return the list of  
> available
>     * versions for that versioned reference.
>     *
>     * NOTE: This is really only useful when working with SNAPSHOTs.
>     *       Not compatible with remote repositories.
>     *
>     * @param reference the versioned reference to work off of.
>     * @return the list of versions found.
>     * @throws ContentNotFoundException if the versioned reference does
>     *         not exist within the repository.
>     */
>    public List<String> getVersions( VersionedReference reference )
>        throws ContentNotFoundException, NotSupportedException;
>
>    /**
>     * Given a specific ProjectReference, return the list of available
>     * versions for that project reference.
>     *
>     * @param reference the project reference to work off of.
>     * @return the list of versions found for that project reference.
>     * @throws ContentNotFoundException if the project reference  
> does not
>     *         exist within the repository.
>     */
>    public List<String> getVersions( ProjectReference reference )
>        throws ContentNotFoundException, NotSupportedException;
> }
> --(snip)--
>
> I feel this is a better long term solution for the persistent layout
> parsing issues we have within Archiva.  However not all of the  
> problems
> have been solved.  I've outlined the ones that need help above in this
> email, but I'm sure there are ones that have been overlooked.
>
> Disclaimer: Yes, this is in the form of a proposal, but I'm already
> working on this, and will continue down this path unless
> someone here has a strong objection about this approach.
>
> -- 
> - Joakim Erdfelt
>  Committer and PMC Member, Apache Maven
>  Archiva Developer
>  joakime@apache.org

--
Brett Porter - brett@apache.org
Blog: http://www.devzuz.org/blogs/bporter/

Mime
View raw message