jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Mueller" <thomas.tom.muel...@gmail.com>
Subject Re: Realtime datastore garbage collector
Date Tue, 20 Nov 2007 07:58:49 GMT

> dataStore.removeTransientIdentifiers(addedProps);

There is a problem with this approach: an identifier can be added to
multiple properties. Also, it may be used at other places. So you
would need to keep a reference count as well. Also, you would need to
be sure the reference counts are updated correctly ('transactional').

It would be a good idea to implement this, however I think with the
current architecture of Jackrabbit (having multiple change logs,
multiple caches, and multiple places where values are used), it is
beyond my ability to verify that the implementation is correct. I just
don't know enough about the Jackrabbit core, and there are not enough
test cases in the Jackrabbit core that would allow automatic

A simpler mechanism would be to store back-references: each data
record / identifier would know who references it. The garbage
collection could then follow the back-references and check if they are
still valid (and if not remove them). Items without valid back
references could be deleted. This allows to delete very large objects
quickly (if they are not used of course).

When we change the architecture of Jackrabbit (see also NGP) we should
think about the data store.

But at this time, I would argue it is safer to keep the data store
mechanism as is, without trying add more features (adding more data
store implementations is not a problem of course), unless we really
fix a bug. I think it makes more sense to spend the time improving the
architecture of Jackrabbit before trying to add more complex
algorithms to the data store (which are not required afterwards).


View raw message