jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From docxa <...@docxa.com>
Subject Performance issue when removing high amount of nodes
Date Wed, 05 Jan 2011 09:24:11 GMT

Hi,

We have to store in our repository a high amount of data, using this kind of
tree:

Project1
|_Stream1
  |__Record1
  |__Record2
  ...
  |__Record120000
...
|_Stream2
  |__Record1
  |__Record2
  ...
  |__Record120000

etc.

It takes some time to add those records, which was expected, but it's even
more time-consuming to remove them. (sometimes even crashing the VM)
I understand it has to do with Jackrabbit putting it all in memory to check
for referential integrity violations.

While searching for answers on the mailing list I saw two ways of dealing
with this:
1- Deactivate referential integrity checking. I tried that, and it did not
seem to accelerate the process, so I may be doing it wrong. (And I guess
it's quite wrong to even do it)
2- Recursively removing nodes by packs.

I noticed than when using the second method, the more children a node have,
the more time it will take to remove some of them. So I guess it would be
best to try and split the records through multiple subtrees.

So I'd like to know if there is a better way of organizing my data in order
to improve the adding and removing operations. And if the deactivation of
referential integrity checking is really risky, and how I'm supposed to do
it? (I tried subclassing RepositoryImpl and using
setReferentialIntegrityChecking but it didn't seem to change anything)

Thank you for your help.

A. Mariette
DOCXA
-- 
View this message in context: http://jackrabbit.510166.n4.nabble.com/Performance-issue-when-removing-high-amount-of-nodes-tp3175050p3175050.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.

Mime
View raw message