archiva-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joakim Erdfelt <>
Subject State of the Archiva (April 2007)
Date Tue, 10 Apr 2007 20:25:07 GMT
State of the Archiva (April 2007)


  Work is continuing on the archiva-jpox-database branch.
  Many many improvements exist currently in that branch.
  The core / base / database changes have settled down, now the work
  in webapp continues to take advantage of the changes made in archiva-base.

:: FUTURE ::

  BRANCH to TRUNK merge.

  In roughly 2 weeks time, the branch will come up for a vote to be merged
  with trunk.   When this gets approval, the new trunk will undergo a
  review with regards to existing jiras to determine if they still exist, or
  can be closed as fixed.

  This is a good time to update the documentation to reflect the current
  archiva UI and configuration process.


  Once the critical jiras have been closed, the initial release of Archiva
  1.0-alpha-1 should be cut.

  After this has occured, progress will continue on 1.0 following the
  outline in

  When we have a feature complete 1.0 (as per the roadmap) we'll start
  the 1.0-M1 release.

  When we have 2 solid weeks on a Milestone release without major bug
  reports we'll start the vote for 1.0 final.


  First: lets show you a brief tour of the directories.

  |-- archiva-base/
  |   |-- archiva-common/
  |   |-- archiva-configuration/
  |   |-- archiva-consumers/                   (NEW)
  |   |   |-- archiva-consumer-api/            (NEW)
  |   |   |-- archiva-core-consumers/          (NEW)
  |   |   |-- archiva-database-consumers/      (NEW)
  |   |   |-- archiva-lucene-consumers/        (NEW)
  |   |   `-- archiva-signature-consumers/     (NEW)
  |   |-- archiva-converter/
  |   |-- archiva-indexer/
  |   |-- archiva-model/                       (NEW)
  |   |-- archiva-proxy/
  |   |-- archiva-repository-layer/
  |   |-- archiva-scheduled/                   (NEW)
  |   `-- archiva-xml-tools/                   (NEW)
  |-- archiva-cli/
  |-- archiva-database/                        (NEW)
  |-- archiva-reporting/
  |   `-- archiva-report-manager/              (Was
  |-- archiva-site/
  |-- archiva-web/
  |   |-- archiva-applet/
  |   |-- archiva-security/
  |   |-- archiva-standalone/
  |   |   |-- archiva-plexus-application/
  |   |   `-- archiva-plexus-runtime/
  |   |-- archiva-webapp/
  |   `-- archiva-webapp-test/
  |-- design/
  |   |-- logos/
  |   `-- white-site/
  `-- maven-meeper/


  modules refactored out of existance:
    * archiva-discoverer
      The classes here have been simplified and merged with
    * archiva-core
      The classes in here have been moved to archiva-repository-layer,
      archiva-common, archiva-model, and archiva-consumer-api

  The archiva-repository-layer module is now the nexus for all things
  that work against the repository.

  The role of archiva.xml configuration file has been changed from being
  the canonical source for 'configured' repositories, to being a bootstrap
  for configured repositories stored and maintained in the database, and
  the list of active consumers to use in the various stages of content
  consumption. (more on that later)

  The use of maven-artifact and maven-project has been removed as the
  assumptions present in each (everything is for the purposes of a build)
  are inappropriate for archiva and jpox.  The new inbuilt replacements
  are more resilient to missing referenced data.

    I had to establish a new set of terminology to describe bits in
    the database.

     Name        | Group ID | Artifact ID | Version | Classifier | Type |
     Project     |  yes     |   yes       |         |            |      |
     Versioned   |  yes     |   yes       |   yes   |            |      |
     Artifact    |  yes     |   yes       |   yes   |    yes     | yes  |

   These terms (Project, Versioned, Artifact) describes the heirarchy that
   is present in the repository.
   1 Project can contain multiple versions, each version can contain
     multiple artifacts.


  The scanning of content from the repository occurs in 2 major stages.

  Major Stage 1:  Scan of repository filesystem.
    Artifacts Stage:
      a) Find the new artifacts and put them into the database as
      b) Find the maven-metadata.xml and put them into the database.
      c) Validate checksums (and report issues).
      d) Create missing checksums.
    Content Stage:
      a) Index content (lucene)
    Bad Content Stage:
      a) Auto remove known bad content.
      b) Auto rename known common filename issues.
      c) Flag remaining unknown content as bad (in report).

  Major Stage 2:  Scan of artifacts from database.
    Unprocessed Artifacts Stage:
      a) Find pom artifacts and load project model into database.
      b) Index artifact details (lucene)
      c) Validate repository metadata.
      d) Index archiva table of contents (lucene)
      e) Update bytecode information in artifact-java-details.
      f) Index public methods (lucene)
    Processed Artifacts Stage:
      a) Artifact not present in filesystem, remove artifact from db.
      b) Artifact of type 'pom' not present in filesystem, remove project
         model from db
      c) Artifact not present in filesystem, remove from lucene index.

  The benefit of these stages is that it allows the content to be found on
  the filesystem and be made available to the users via the browse interface
  relatively quickly. (Takes about 6 minutes to scan all of ibiblio this

  If a user happens to request an versioned project browse that has
  yet to undergo the Major Stage 2, a 'Just in Time' scan of that specific
  project is done.

  The repository scan has been changed to include all content "**/*" and
  specifically exclude known ignorable content. For each discovered file
  a determination is made to see if it falls into the Artifact list or
  the Content list, if it doesnt' fall into those two lists.

  The archiva.xml contains the lists of patterns for ...
    a) Artifacts
    b) Indexable Content
    c) Auto-Remove
    d) Ignored

  For latest, in code, lists see:


  This is a fundamental part of how archiva knows what to do with the
  it is tracking.

  We have 2 major consumer api interfaces.

  RepositoryContentConsumer -
    This consumer interface is used for those consumers that want to operate
    on the raw files in the repository filesystem.

  ArchivaArtifactConsumer   -
    This consumer interface is used for those consumers that want to operate
    on artifacts.  Those consumers operating on the second major phase (as
    outlined above as the Database Scan) should use this interface.

  This allows for a very simplified content scan and manipulation in

- Joakim Erdfelt

View raw message