jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting" <jukka.zitt...@gmail.com>
Subject Re: JackRabbit performance tests report
Date Fri, 24 Nov 2006 14:17:50 GMT
Hi,

On 11/24/06, Marcin Nowak <marcin.j.nowak@comarch.com> wrote:
> Recently I've performed some common tests of JackRabbit performance. Results
> of these tests can be seen in attached files. I would like to hear your
> comment - and you answer on point 4.4 Suggestions and wishes.

Thanks a lot for sharing the results with us! This is very interesting
data. Some quick comments:

> Most important think to us is to improve performance of importing XML
> documents into repository and to reduce the overhead in RAM to lets say
> 20x. Moreover we are interested in importing 20[MB] XML documents in
> 10 minutes with memory usage not exceeding 400[MB] of RAM and CPU
> usage allowing other users normal work with repository.

The limiting factor with importing large XML documents is that the
entire set of changes is built within the transient space of a session
before being persisted. And since a Jackrabbit NodeState is even
bigger than your average DOM element node, your memory usage will grow
rapidly.

The preferred alternative I've been using for importing large XML
documents is to create a custom importer class that calls
Session.save() every now and then to persist the pending changes. You
need to be careful to avoid inconsistencies like broken references
with this approach, but otherwise it works fine.

> The thing which is also very important to us is server CPU usage when deleting
> repository content – currently this operation blocks all other actions on server by
> loading it in 100%, it disallows other users from any actions performed on
> repository.

Large deletes are similarly expensive in that all the deleted states
need to be loaded into the transient space before Session.save() gets
called. Deletes also fire a number of internal consistency checks for
example to enforce that no broken references are left around.

Here as well I'd recommend trying to break large deletes to a sequence
of smaller operations.

> It is also crucial to us to be able to export repository of size 100[MB] to XML
> file and to restore whole repository from that file later.

Unless you are able to use a custom importer, I would rather recommend
backing up and restoring the entire repository directory instead. This
approach requires careful synchronization, optimally a repository
shutdown during backup and restore, but avoids the performance issues
of exporting and importing large XML documents.

There are some improvements we could make to make importing large XML
documents better, but none of them are too easy to implement
especially since we need to maintain spec compliance. There is an
ongoing effort to refactor parts of the XML importer mechanism, and
one outcome of that effort could be to make it easier to write custom
importers for large XML documents.

> We would also like to suggest improvement of network usage when
> communicating with server.

The JCR-RMI layer is inherently quite verbose since it maps almost all
JCR API calls to serialized RMI method invocations. There are some
tricks we could do to speed things up and avoid too much network
usage, but the ongoing SPI effort seems to offer a much better
alternative for remote repository access so JCR-RMI is currently not
being actively improved.

BR,

Jukka Zitting
Mime
View raw message