jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcel Reutegger <marcel.reuteg...@gmx.net>
Subject Re: importing jackrabbit into jackrabbit
Date Thu, 26 Apr 2007 08:10:59 GMT
Stefan Kurla wrote:
> As far as the file nodetype is concerned, this is a custom nodetype
> which has 4 references per file imported and currently, all the
> references are made to the same UUID since we are testing, this could
> change in the future.

this may be the time consuming factor. whenever a reference is added that points 
to a node N the complete set of references pointing to N is re-written to the 
persistence manager. with increasing number of references to N this will slow 
down your import. is there a reason why all files point to the same node?

> Any tips or ideas? I will update the results of the test. Right now I
> have imported 1K out of 12K files and the import time has gone up to 4
> seconds per file. Is this normal? Remember since I am importing the
> jackrabbit SVN all files are put under one nt:folder which is
> "jackrabbit". This is a pretty normal case of about 12K files and only
> 78MB. We have plans of a 1TB repository.

I did a quick test with an adapted version of 
http://svn.apache.org/repos/asf/jackrabbit/trunk/jackrabbit-core/src/test/java/org/apache/jackrabbit/core/query/TextExtractorTest.java
that saves changes whenever 100 files have been imported.

I used the svn export of jackrabbit/trunk (~3000 files in ~900 folders)

configuration:
- jackrabbit in-process
- o.a.j.c.persistence.db.DerbyPersistenceManager (externalBlobs = false)
- text extractors: pdf, xml and plain text

test result:

Imported 2978 files in 50484 ms.

regards
  marcel

Mime
View raw message