Return-Path: Delivered-To: apmail-maven-m2-dev-archive@www.apache.org Received: (qmail 93674 invoked from network); 18 Mar 2005 00:17:23 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 18 Mar 2005 00:17:23 -0000 Received: (qmail 13803 invoked by uid 500); 18 Mar 2005 00:17:23 -0000 Delivered-To: apmail-maven-m2-dev-archive@maven.apache.org Received: (qmail 13788 invoked by uid 500); 18 Mar 2005 00:17:23 -0000 Mailing-List: contact m2-dev-help@maven.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: "Maven 2 Developers List" Reply-To: "Maven 2 Developers List" Delivered-To: mailing list m2-dev@maven.apache.org Received: (qmail 13775 invoked by uid 99); 18 Mar 2005 00:17:22 -0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS X-Spam-Check-By: apache.org Received-SPF: neutral (hermes.apache.org: local policy) Received: from outbound.mailhop.org (HELO outbound.mailhop.org) (63.208.196.171) by apache.org (qpsmtpd/0.28) with ESMTP; Thu, 17 Mar 2005 16:17:22 -0800 Received: from adsl-214-96-174.gnv.bellsouth.net ([68.214.96.174] helo=[192.168.2.122]) by outbound.mailhop.org with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.44) id 1DC5B1-0003RM-Av for m2-dev@maven.apache.org; Thu, 17 Mar 2005 19:17:19 -0500 Message-ID: <423A1D0C.5000605@commonjava.org> Date: Thu, 17 Mar 2005 19:13:00 -0500 From: John Casey User-Agent: Mozilla Thunderbird 1.0 (X11/20041206) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Maven 2 Developers List Subject: new repository conversion and cleanup tool in sandbox: repoclean X-Enigmail-Version: 0.90.2.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Mail-Handler: MailHop Outbound by DynDNS.org X-Originating-IP: 68.214.96.174 X-Report-Abuse-To: abuse@dyndns.org (see http://www.mailhop.org/outbound/abuse.html for abuse reporting information) X-MHO-User: jdcasey X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 M2 Devs, Sorry for the long email, but please hear me out. I feel like I finally have enough of an understanding on this problem that I can talk intelligently with others. So, here is a summary of my first pass at solving the problem of repository conversion... I know that we already have two repository-massaging tools in maven-components already. I looked at both when trying to address MNG-197. In the case of the pre-alpha-to-v4 converter, it didn't seem expandable into the world of v3 poms without a major overhaul. Which leaves the repository-tool subproject. There are two reasons why I didn't just enhance repository-tool: 1. I didn't feel like I had enough time before this weekend's mini-deadline to understand and complete the repository-tool. This would have involved quite a lot of code-reading, and double-checking against v3 documentation to ensure that all bases are covered. 2. It looks like this would be better suited as a plexus application. Refactoring to a plexus-based app will allow us to introduce new validators and/or patching components to improve the conversion process. I don't know if these were reason enough to justify a whole new application, and I'd be happy to merge the repoclean tool with repository-tool in the future when time permits. Carlos, especially for your sake, I'd like to outline the features and approach of repoclean. Once you have an idea of what I've done, perhaps you can suggest some things I missed, and we can start bringing these tools toward a merger. Essentially, I was trying to follow the contents of MNG-197 as closely as possible with this tool. The resulting architecture is an acknowledgement of the fact that we're really doing whole-repository maintenance here, not merely translating individual poms. Here is a brief rundown of the design: 1. Main class which processes command-line arguments, starts an Embedder instance, looks up an instance of the RepositoryCleaner component, and fires the cleanRepository() method. 2. RepositoryCleaner is the controller class for the application. Using plexus' dependency injection, I have access to the components that execute the various operations on the repository. Some of these operate on the whole repository at a go, and others operate on a per-POM basis. A. Verify the validity of both the repository basedir and the reports basedir. If either is invalid, error out. B. Scan the repository to create a list of pom files in the repository. We'll use this list multiple times later, so this is an optimization step. C. Scan the repository to create a list of artifact (non-pom, non-md5) files in the repository. We'll use this list multiple times later, so this is an optimization step. D. Setup the reporter for the repo-level operations. E. Call the ArtifactPomCorrelator which matches POMs to artifacts, and spits out error messages to the reporter for any orphaned artifacts. This is a repository-level operation. F. Call the ArtifactMd5Correlator which matches artifacts to MD5 digest files, and spits out error messages to the reporter for any artifacts that are not accompanied by MD5 digests. If we're not executing in report-only mode, this component will also create any missing md5 files. This is also a repository-level operation. G. Now, we move into the per-POM operations. For each POM, we first setup a Reporter to record errors/warnings/etc. pertaining only to that POM. H. Read the v3 POM from file. I. Translate the v3 POM to a v4 POM using the PomV3ToV4Translator. This will spit out warnings to the reporter for any elements that don't translate (like aspectSourceDirectory), and errors where only partial information is provided (as in distributionSite/distributionDirectory). This is the only validation provided by the translator. J. Call the V4ModelIndependenceValidator to verify the ability of that model to provide the minimum required information set to distinguish one project from another, independent of any information in a parent model (via the element, which is ignored). On this pass, only report failures as warnings to the reporter. K. If (J.) above fails, call the V4ModelPatcher to parse the path of the POM in the repository in an effort to glean any information that may be missing from the model. If the path is valid, fill in any missing information in the model. L. If (J.) above fails, re-call the V4ModelIndependenceValidator, this time in error-reporting mode. If the model is still missing required information, this time the validator will report errors instead of warnings. M. If we're not executing in report-only mode, write the v4 POM to the repository in place of the old v3 POM file. N. Flush all reporters. As you can see, this tool does not account for backup/restore operations on the repository. It is assumed that measures will be taken outside the scope of the tool to make a backup copy of the repository before execution. If I'm missing anything in this, please let me know. I've included a bash script to install the tool at a location of your choice using: 'sh ./install.sh /path/to/target/install/dir /path/to/local/repo' and another bash script to execute the tool using: './repoclean.sh /path/to/repository /path/to/reports/directory' I think repoclean is a reasonable first stab at this problem, but I know it needs to be much better than that. Please don't hesitate to shoot holes in this thing! :) Also: I will be duplicating this doco in an APT file somewhere in maven-components, so that we can start recording the design discussion. Thanks, john -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (GNU/Linux) iD8DBQFCOh0LK3h2CZwO/4URAp/BAJwIT/F9tlVgnhICWJCXMHy2E8tWEQCeNb8F 3jGhFSdMOK1sp05khzPJQ94= =ZE3H -----END PGP SIGNATURE-----