Return-Path: Delivered-To: apmail-jackrabbit-users-archive@minotaur.apache.org Received: (qmail 68129 invoked from network); 5 Jan 2011 11:03:53 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 5 Jan 2011 11:03:53 -0000 Received: (qmail 38128 invoked by uid 500); 5 Jan 2011 11:03:53 -0000 Delivered-To: apmail-jackrabbit-users-archive@jackrabbit.apache.org Received: (qmail 37896 invoked by uid 500); 5 Jan 2011 11:03:51 -0000 Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@jackrabbit.apache.org Delivered-To: mailing list users@jackrabbit.apache.org Delivered-To: moderator for users@jackrabbit.apache.org Received: (qmail 6420 invoked by uid 99); 5 Jan 2011 09:24:39 -0000 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=SPF_NEUTRAL,URI_HEX X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: 216.139.236.26 is neither permitted nor denied by domain of dev@docxa.com) Date: Wed, 5 Jan 2011 01:24:11 -0800 (PST) From: docxa To: users@jackrabbit.apache.org Message-ID: <1294219451813-3175050.post@n4.nabble.com> Subject: Performance issue when removing high amount of nodes MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi, We have to store in our repository a high amount of data, using this kind of tree: Project1 |_Stream1 |__Record1 |__Record2 ... |__Record120000 ... |_Stream2 |__Record1 |__Record2 ... |__Record120000 etc. It takes some time to add those records, which was expected, but it's even more time-consuming to remove them. (sometimes even crashing the VM) I understand it has to do with Jackrabbit putting it all in memory to check for referential integrity violations. While searching for answers on the mailing list I saw two ways of dealing with this: 1- Deactivate referential integrity checking. I tried that, and it did not seem to accelerate the process, so I may be doing it wrong. (And I guess it's quite wrong to even do it) 2- Recursively removing nodes by packs. I noticed than when using the second method, the more children a node have, the more time it will take to remove some of them. So I guess it would be best to try and split the records through multiple subtrees. So I'd like to know if there is a better way of organizing my data in order to improve the adding and removing operations. And if the deactivation of referential integrity checking is really risky, and how I'm supposed to do it? (I tried subclassing RepositoryImpl and using setReferentialIntegrityChecking but it didn't seem to change anything) Thank you for your help. A. Mariette DOCXA -- View this message in context: http://jackrabbit.510166.n4.nabble.com/Performance-issue-when-removing-high-amount-of-nodes-tp3175050p3175050.html Sent from the Jackrabbit - Users mailing list archive at Nabble.com.