jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Klimetschek <aklim...@day.com>
Subject Re: [jr3] MVCC
Date Wed, 17 Feb 2010 19:12:29 GMT
On Wed, Feb 17, 2010 at 18:06, Jukka Zitting <jukka.zitting@gmail.com> wrote:
> Hi,
>
> As discussed already before, switching from the current design where
> all writes essentially block all concurrent repository access to a
> MVCC design that decouples all read access from concurrent writes
> would help us solve a number of performance bottlenecks. In addition
> to the performance improvements the MVCC design avoids many of the
> internal synchronization and cache invalidation mechanisms we have and
> thus could help reduce complexity and avoid potential concurrency
> issues.
>
> Taken to the extreme we could even use MVCC to avoid the troublesome
> situation where subsequent calls to a method like Property.getString()
> return different values or may even start throwing exceptions because
> of the actions of another session. With MVCC we could ensure that the
> content seen by a session remains constant until an explicit
> Session.refresh() call is made or something like an observation
> listener is registered.
>
> Implementing this will be somewhat tricky as it affects large areas of
> the core code and spans over the current PersistenceManager boundary.
> Also the interaction with things like the search index or other
> resources stored outside the main persistence mechanism is non-trivial
> (but see the thread on uniform persistence).
>
> Should we do this? How?

I think we should. I once started a prototype to play around with
that, using a RDBMS with thoughtful designed tables and indexes for
it.

Most importantly, for each node multiple versions (not jcr versions)
can be present, up to the point that the transient space is stored
inside as well. Once a session starts, it will stay on a certain
version for reading and it won't see the updates of other sessions
(until it calls refresh() or writes). This is done by keeping the
current version of a node that gets overwritten, so that the reading
session can still see the same state as when it started. A garbage
collector can delete old versions that are no longer referenced from
any session from time to time. This is basically a strict copy-on-read
model as described in the spec, as opposed to the current
copy-on-write implementation in Jackrabbit.

The problem with the above copy-on-read is that it you often have
long-running sessions for observation listeners that would keep the
garbage collector from doing any cleanup. This leads to quite some
disk space usage. Also, I am not sure how the concept of observation
listeners works with a copy-on-read implementation. Maybe one
should/has to call refresh() in the event listener before accessing
the modified items.

It would also be quite easy to change the behavior to copy-on-write,
and it could be configurable, so that one could chose depending on the
application.

Note that these are merely theoretical points so far and need to be
practically proven (the prototype is still in early stages...).

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Mime
View raw message