From dev-return-34819-apmail-jackrabbit-dev-archive=jackrabbit.apache.org@jackrabbit.apache.org Wed Apr 18 09:29:14 2012 Return-Path: X-Original-To: apmail-jackrabbit-dev-archive@www.apache.org Delivered-To: apmail-jackrabbit-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BCE289DB4 for ; Wed, 18 Apr 2012 09:29:14 +0000 (UTC) Received: (qmail 70670 invoked by uid 500); 18 Apr 2012 09:29:14 -0000 Delivered-To: apmail-jackrabbit-dev-archive@jackrabbit.apache.org Received: (qmail 70532 invoked by uid 500); 18 Apr 2012 09:29:14 -0000 Mailing-List: contact dev-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@jackrabbit.apache.org Delivered-To: mailing list dev@jackrabbit.apache.org Received: (qmail 70497 invoked by uid 99); 18 Apr 2012 09:29:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Apr 2012 09:29:13 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of julian.reschke@gmx.de designates 213.165.64.22 as permitted sender) Received: from [213.165.64.22] (HELO mailout-de.gmx.net) (213.165.64.22) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 18 Apr 2012 09:29:05 +0000 Received: (qmail invoked by alias); 18 Apr 2012 09:28:43 -0000 Received: from p57A6F80E.dip.t-dialin.net (EHLO [192.168.178.36]) [87.166.248.14] by mail.gmx.net (mp002) with SMTP; 18 Apr 2012 11:28:43 +0200 X-Authenticated: #1915285 X-Provags-ID: V01U2FsdGVkX189fX0I5k09HblMxyvEWr42AJdPpMSd2Aijq+jCkq elj3/nIsqePxE3 Message-ID: <4F8E894A.2080407@gmx.de> Date: Wed, 18 Apr 2012 11:28:42 +0200 From: Julian Reschke User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0) Gecko/20120327 Thunderbird/11.0.1 MIME-Version: 1.0 To: "dev@jackrabbit.apache.org" Subject: jackrabbit-core RepositoryChecker.fix() can fail with OOM Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 X-Virus-Checked: Checked by ClamAV on apache.org Hi there. (posting here instead of opening a ticket because JIRA is currently down) It appears that people are (ab)using the RepositoryChecker to fix the versioning information in their repo after *removing* the version storage. (It would be good to understand why this happens, but anyway...) The RepositoryChecker, as currently implemented, walks the repository, collects changes, and, when done, submits them as a single repository ChangeLog. This will not work if the number of affected nodes is big. Unfortunately, the checker is currently designed to do things to two steps; we could of course stop collecting changes after a threshold, then apply what we have, then re-run the checker. That would probably work, but would be slow on huge repositories. The best alternative I see is to add a checkAndFix() method that is allowed to apply ChangeLogs to the repository on the run (and of course to use that variant from within RepositoryImpl.doVersionRecovery()). Feedback appreciated, Julian