archiva-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joakim Erdfelt <>
Subject Re: Archiva Consumers question
Date Wed, 17 Oct 2007 14:27:59 GMT
Everything in the UI uses the database.
A full scan from disk to database take *FAR* too long.

This was setup as a tiered effort. 
First step. get the real, valid, useful content off of disk and into the 
database into a usable form.
Second step. expand the data in the database fully.

The first step makes the content available for browse immediately, in 
the order of ~55ms per file.
The second step takes an average of 3 seconds per file right now.
NOTE: Even though the database scan is short now, this is expected to 
take multi-minute per artifact in the future, as the consumer complexity 
start to grow. (think bytecode scan / checksumming / indexing / 
cross-referencing, and gpg signature confirmation processes to name just 
2 that we are aware of)

In large repositories, this is the only way to get the content available 
within a 24 hour window.
This problem also exists with the old technique + large repositories + 
scan for new content.

- Joakim

Wendy Smoak wrote:
> On 10/16/07, Joakim Erdfelt <> wrote:
>> ArchivaArtifactConsumer is an abstract-dealing-with-artifacts consumer.
>> RepositoryContentConsumer is for files.
>> A file that isn't an artifact can be *.xml, *.sha1, *.md5,
>> maven-metadata.xml, bad content, poorly named content, etc.
>> Would it be better to state the phase/scan instead?
>> RepositoryContentConsumer becomes -> RepositoryScanConsumer
>> ArchivaArtifactConsumer becomes -> DatabaseScanConsumer
> All artifacts _are_ repository content, are they not?  And even after
> the renaming... it can't be in the database unless it's in the
> repository.
> I understand scanning the filesystem to update the database.  But when
> and why do you "scan" the database?

- Joakim Erdfelt
  Open Source Software (OSS) Developer

View raw message