jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bart van der Schans <b.vandersch...@onehippo.com>
Subject Re: jackrabbit-core RepositoryChecker.fix() can fail with OOM
Date Wed, 18 Apr 2012 09:51:07 GMT
Hi Julian,

On Wed, Apr 18, 2012 at 11:28 AM, Julian Reschke <julian.reschke@gmx.de> wrote:
> Hi there.
>
> (posting here instead of opening a ticket because JIRA is currently down)
>
> It appears that people are (ab)using the RepositoryChecker to fix the
> versioning information in their repo after *removing* the version storage.
> (It would be good to understand why this happens, but anyway...)

Could it be that people want to cleanup their version history as it
can grow quite large over time? We have had this request several times
from customers. An option could be to provide a more convenient way to
do clean it up properly.

> The RepositoryChecker, as currently implemented, walks the repository,
> collects changes, and, when done, submits them as a single repository
> ChangeLog.
>
> This will not work if the number of affected nodes is big.
>
> Unfortunately, the checker is currently designed to do things to two steps;
> we could of course stop collecting changes after a threshold, then apply
> what we have, then re-run the checker. That would probably work, but would
> be slow on huge repositories.
>
> The best alternative I see is to add a checkAndFix() method that is allowed
> to apply ChangeLogs to the repository on the run (and of course to use that
> variant from within RepositoryImpl.doVersionRecovery()).
>
> Feedback appreciated, Julian

We (@Hippo) have been doing quite a bit of work on the consistency
checker lately. See the following issues:

https://issues.apache.org/jira/browse/JCR-3267
https://issues.apache.org/jira/browse/JCR-3265
https://issues.apache.org/jira/browse/JCR-3269
https://issues.apache.org/jira/browse/JCR-3277
https://issues.apache.org/jira/browse/JCR-3263

It might be interesting to see what kind of options we have to
implement such an approach. We found that building a complete
hierarchy tree in memory and then doing the consistency checks is by
far the fastest way to run a complete check (something like 50x times
faster). But as noted it will require quite some memory for the check
and possible for the fix. In our current tests when can create the in
memory model for about 3 million nodes in 1GB of heap.

Regards,
Bart

Mime
View raw message