jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Reschke <julian.resc...@gmx.de>
Subject Re: jackrabbit-core RepositoryChecker.fix() can fail with OOM
Date Wed, 18 Apr 2012 10:26:22 GMT
On 2012-04-18 11:51, Bart van der Schans wrote:
> Hi Julian,
> On Wed, Apr 18, 2012 at 11:28 AM, Julian Reschke<julian.reschke@gmx.de>  wrote:
>> Hi there.
>> (posting here instead of opening a ticket because JIRA is currently down)
>> It appears that people are (ab)using the RepositoryChecker to fix the
>> versioning information in their repo after *removing* the version storage.
>> (It would be good to understand why this happens, but anyway...)
> Could it be that people want to cleanup their version history as it
> can grow quite large over time? We have had this request several times
> from customers. An option could be to provide a more convenient way to
> do clean it up properly.

Maybe (but not in this case).

>> The RepositoryChecker, as currently implemented, walks the repository,
>> collects changes, and, when done, submits them as a single repository
>> ChangeLog.
>> This will not work if the number of affected nodes is big.
>> Unfortunately, the checker is currently designed to do things to two steps;
>> we could of course stop collecting changes after a threshold, then apply
>> what we have, then re-run the checker. That would probably work, but would
>> be slow on huge repositories.
>> The best alternative I see is to add a checkAndFix() method that is allowed
>> to apply ChangeLogs to the repository on the run (and of course to use that
>> variant from within RepositoryImpl.doVersionRecovery()).
>> Feedback appreciated, Julian
> We (@Hippo) have been doing quite a bit of work on the consistency
> checker lately. See the following issues:
> https://issues.apache.org/jira/browse/JCR-3267
> https://issues.apache.org/jira/browse/JCR-3265
> https://issues.apache.org/jira/browse/JCR-3269
> https://issues.apache.org/jira/browse/JCR-3277
> https://issues.apache.org/jira/browse/JCR-3263

Saw that (and sorry for not providing feedback yet). But that was about 
the *ConsistencyChecker*, not the *RepositoryChecker*, right? (The 
latter fixes versioning inconsistencies, so it operates at a higher level).

> It might be interesting to see what kind of options we have to
> implement such an approach. We found that building a complete
> hierarchy tree in memory and then doing the consistency checks is by
> far the fastest way to run a complete check (something like 50x times
> faster). But as noted it will require quite some memory for the check
> and possible for the fix. In our current tests when can create the in
> memory model for about 3 million nodes in 1GB of heap.
> Regards,
> Bart

View raw message