Return-Path: Delivered-To: apmail-jackrabbit-dev-archive@www.apache.org Received: (qmail 96907 invoked from network); 22 Nov 2007 13:44:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 22 Nov 2007 13:44:09 -0000 Received: (qmail 2460 invoked by uid 500); 22 Nov 2007 13:43:57 -0000 Delivered-To: apmail-jackrabbit-dev-archive@jackrabbit.apache.org Received: (qmail 2094 invoked by uid 500); 22 Nov 2007 13:43:56 -0000 Mailing-List: contact dev-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@jackrabbit.apache.org Delivered-To: mailing list dev@jackrabbit.apache.org Received: (qmail 2085 invoked by uid 99); 22 Nov 2007 13:43:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Nov 2007 05:43:56 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of esteban.franqueiro@bea.com designates 66.248.192.39 as permitted sender) Received: from [66.248.192.39] (HELO repmmg02.bea.com) (66.248.192.39) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Nov 2007 13:43:46 +0000 Received: from repmmr02.bea.com (repmmr02.bea.com [10.160.30.72]) by repmmg02.bea.com (Switch-3.3.0/Switch-3.2.7) with ESMTP id lAMDhcUa017915 for ; Thu, 22 Nov 2007 05:43:38 -0800 Received: from goku ([10.36.9.115]) by repmmr02.bea.com (Switch-3.3.0/Switch-3.2.7) with SMTP id lAMDhZJ0021392 for ; Thu, 22 Nov 2007 05:43:36 -0800 Message-ID: <095001c82d0d$aec770b0$7309240a@goku> From: "Esteban Franqueiro" To: Subject: Fw: Realtime datastore garbage collector Date: Thu, 22 Nov 2007 10:43:35 -0300 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.3790.1830 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.1830 x-BEA-PMX-Instructions: AV x-BEA-MM: Internal-To-External X-Virus-Checked: Checked by ClamAV on apache.org Re: Realtime datastore garbage collectorHi Thomas. > > dataStore.removeTransientIdentifiers(addedProps); > There is a problem with this approach: an identifier can be added to > multiple properties. Also, it may be used at other places. So you > would need to keep a reference count as well. Also, you would need to > be sure the reference counts are updated correctly ('transactional'). Can you provide a test for this scenario? Regarding the solution adopted, I think it's a good test to have. I did run a quick test here and it didn't fail, but I'm not sure if it's correct. > It would be a good idea to implement this, however I think with the > current architecture of Jackrabbit (having multiple change logs, > multiple caches, and multiple places where values are used), it is > beyond my ability to verify that the implementation is correct. I just > don't know enough about the Jackrabbit core, and there are not enough > test cases in the Jackrabbit core that would allow automatic > verification. > A simpler mechanism would be to store back-references: each data > record / identifier would know who references it. The garbage > collection could then follow the back-references and check if they are > still valid (and if not remove them). Items without valid back > references could be deleted. This allows to delete very large objects > quickly (if they are not used of course). An you elaborate on this? Maybe I can test the idea then. > When we change the architecture of Jackrabbit (see also NGP) we should > think about the data store. Definitely :) We should change things with per-node concurrency in mind. And may be the data store could be more integrated... I guess we'll see. > But at this time, I would argue it is safer to keep the data store > mechanism as is, without trying add more features (adding more data > store implementations is not a problem of course), unless we really > fix a bug. I think it makes more sense to spend the time improving the > architecture of Jackrabbit before trying to add more complex > algorithms to the data store (which are not required afterwards). This is not another feature, it's the most useful version of the GC. I think it's critical for large repositories to have a GC that periodically reclaims unused space. Regarding the scenario I presented, what I would like to know is if we consider it an acceptable risk or not. I'm still not sure about this issue. Regards, Esteban Franqueiro esteban.franqueiro@bea.com Notice: This email message, together with any attachments, may contain information of BEA Systems, Inc., its subsidiaries and affiliated entities, that may be confidential, proprietary, copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.