jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Esteban Franqueiro" <esteban.franque...@bea.com>
Subject Fw: Realtime datastore garbage collector
Date Thu, 22 Nov 2007 13:43:35 GMT
Re: Realtime datastore garbage collectorHi Thomas.

> > dataStore.removeTransientIdentifiers(addedProps);

> There is a problem with this approach: an identifier can be added to
> multiple properties. Also, it may be used at other places. So you
> would need to keep a reference count as well. Also, you would need to
> be sure the reference counts are updated correctly ('transactional').

Can you provide a test for this scenario?
Regarding the solution adopted, I think it's a good test to have.
I did run a quick test here and it didn't fail, but I'm not sure if it's correct.

> It would be a good idea to implement this, however I think with the
> current architecture of Jackrabbit (having multiple change logs,
> multiple caches, and multiple places where values are used), it is
> beyond my ability to verify that the implementation is correct. I just
> don't know enough about the Jackrabbit core, and there are not enough
> test cases in the Jackrabbit core that would allow automatic
> verification.

> A simpler mechanism would be to store back-references: each data
> record / identifier would know who references it. The garbage
> collection could then follow the back-references and check if they are
> still valid (and if not remove them). Items without valid back
> references could be deleted. This allows to delete very large objects
> quickly (if they are not used of course).

An you elaborate on this? Maybe I can test the idea then.

> When we change the architecture of Jackrabbit (see also NGP) we should
> think about the data store.

Definitely :)
We should change things with per-node concurrency in mind. And may be the data store could
be more 
integrated... I guess we'll see.

> But at this time, I would argue it is safer to keep the data store
> mechanism as is, without trying add more features (adding more data
> store implementations is not a problem of course), unless we really
> fix a bug. I think it makes more sense to spend the time improving the
> architecture of Jackrabbit before trying to add more complex
> algorithms to the data store (which are not required afterwards).

This is not another feature, it's the most useful version of the GC. I think it's critical
for large 
repositories to have a GC that periodically reclaims unused space.

Regarding the scenario I presented, what I would like to know is if we consider it an acceptable

risk or not. I'm still not sure about this issue.


Esteban Franqueiro

Notice:  This email message, together with any attachments, may contain information  of  BEA
Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,
 copyrighted  and/or legally privileged, and is intended solely for the use of the individual
or entity named in this message. If you are not the intended recipient, and have received
this message in error, please immediately return this by email and then delete it.

View raw message