jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Seidel. Robert" <Robert.Sei...@aeb.de>
Subject Write performance problems
Date Wed, 26 Jun 2013 17:52:26 GMT

I'm using jackrabbit 2.4.2 and facing performance problems. I know write performance is a
huge disadvantage of jackrabbit, cause it has to be done all single threaded.

The situation is that I want to migrate data from some old software and put it into a jackrabbit
repository with a bundled database manager (clustered environment, versionable nodetype).
The process is to get some (maybe 500) old data sets and then create nodes for them and save
the session, then repeat with the next data block.

The session.save() is slowing all the process down, especially the database operations. I've
made a trace of the sql statements and here is what I got:

The following steps are repeated very often (I guess for each new node):

-       Selects at versioning bundle

-       Update at global revision (committed with the journal insert)

-       Inserts and Updates at versioning bundle (committed)

-       Selects at workspace bundle

-       Insert at journal (committed)

-       Update at local revisions (auto-committed)

After this, all of the workspace bundles are saved at once, in one database transaction.

Two points:

1.    Versioning

Most of my nodes exist in just one version (about 90%), but because some of them are versioned,
I need the versionable nodetype. But for all the others a version history is created, consuming
database space and write performance. Why can't the version history not be created if a node
is checked in? This would save space and time, if a node is versionable but not actually versioned.
Or is there a solution for a situation like this?

I've also done some testing with multiple sessions and multithreading. In the result all but
one thread was waiting for the exclusive read/write lock of the version manager - so no multithreading
possible, as expected.

2.    Operations for each node
The write performance can be fastened by 4 or 5 times, if the operations are more bundled
in transactions like the inserting/updating of the workspace bundles, reducing the commits
to a minimum. Storing the workspace bundles takes nearly the same time as storing the versioning
information for one node (one cycle). The updates of global revision and local revisions can
be done once and not once per changed node reducing the necessary time to a minimum.

I'm going to solve my performance problems now with multiple repositories and data splitting...

Regards, Robert

Mit freundlichen Grüßen

i. A. Robert Seidel, Software Infrastructure, Senior Professional
D-23552 Lübeck, Kanalstraße 62-64
Tel. +49-451-2928938-130
Fax +49-451-2928938-333
AEB Gesellschaft zur Entwicklung von Branchen-Software mbH
Stammsitz Stuttgart
Registergericht: Amtsgericht Stuttgart, HRB 84 31
Gerichtsstand Stuttgart
Geschäftsführer: Jochen Günzel, Markus Meißner

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message