jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Reschke <julian.resc...@gmx.de>
Subject Re: jackrabbit-core RepositoryChecker.fix() can fail with OOM
Date Thu, 19 Apr 2012 11:43:54 GMT
On 2012-04-18 13:05, Bart van der Schans wrote:
> On Wed, Apr 18, 2012 at 12:26 PM, Julian Reschke<julian.reschke@gmx.de>  wrote:
>> On 2012-04-18 11:51, Bart van der Schans wrote:
>>>
>>> Hi Julian,
>>>
>>> On Wed, Apr 18, 2012 at 11:28 AM, Julian Reschke<julian.reschke@gmx.de>
>>>   wrote:
>>>>
>>>> Hi there.
>>>>
>>>> (posting here instead of opening a ticket because JIRA is currently down)
>>>>
>>>> It appears that people are (ab)using the RepositoryChecker to fix the
>>>> versioning information in their repo after *removing* the version
>>>> storage.
>>>> (It would be good to understand why this happens, but anyway...)
>>>
>>>
>>> Could it be that people want to cleanup their version history as it
>>> can grow quite large over time? We have had this request several times
>>> from customers. An option could be to provide a more convenient way to
>>> do clean it up properly.
>>
>>
>> Maybe (but not in this case).
>>
>>
>>>> The RepositoryChecker, as currently implemented, walks the repository,
>>>> collects changes, and, when done, submits them as a single repository
>>>> ChangeLog.
>>>>
>>>> This will not work if the number of affected nodes is big.
>>>>
>>>> Unfortunately, the checker is currently designed to do things to two
>>>> steps;
>>>> we could of course stop collecting changes after a threshold, then apply
>>>> what we have, then re-run the checker. That would probably work, but
>>>> would
>>>> be slow on huge repositories.
>>>>
>>>> The best alternative I see is to add a checkAndFix() method that is
>>>> allowed
>>>> to apply ChangeLogs to the repository on the run (and of course to use
>>>> that
>>>> variant from within RepositoryImpl.doVersionRecovery()).
>>>>
>>>> Feedback appreciated, Julian
>>>
>>>
>>> We (@Hippo) have been doing quite a bit of work on the consistency
>>> checker lately. See the following issues:
>>>
>>> https://issues.apache.org/jira/browse/JCR-3267
>>> https://issues.apache.org/jira/browse/JCR-3265
>>> https://issues.apache.org/jira/browse/JCR-3269
>>> https://issues.apache.org/jira/browse/JCR-3277
>>> https://issues.apache.org/jira/browse/JCR-3263
>>
>>
>> Saw that (and sorry for not providing feedback yet). But that was about the
>> *ConsistencyChecker*, not the *RepositoryChecker*, right? (The latter fixes
>> versioning inconsistencies, so it operates at a higher level).
>
> It's about both ;-)
>
> I hope to have some free cycles soon to go over the issues. Some are
> straight fixes and some concern some larger changes which probably
> need some input from other developers as well.
>
> To provide some background information: we have had some serious
> issues with inconsistencies in the repository with several customers.
> We've invested quite some time in tracking down the root cause of
> these problems (I will send an email about that shortly) and creating
> a standalone checker that can quickly check and fix all current
> inconsistencies. We can now check and fix millions of nodes in the
> matter of minutes although this comes at a cost of quite some memory
> usage. The current checks also didn't find all inconsistencies so we
> improved/added some checks.
> ...

Note that we have a test case for repository fixes (see 
AutoFixCorruptNode); it would probably be good to have test coverage for 
any functionality you're adding...

Best regards, Julian

Mime
View raw message