jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Nuescheler" <da...@day.com>
Subject [RT] Evolution of Persistence
Date Fri, 18 Apr 2008 22:47:07 GMT
[RT] Evolution of Persistence

To be able to address some of the performance and scalability
limitations that we run into in the past based on our growing
experience I would like to propose that we kick
off a discussion around an evolution of the persistence model
of Jackrabbit.

In various conversations on the topic of persistence I
observed that horizontal, free scalability in a cluster
for both reads and writes is a topic that we need to keep
in mind. I think that because of that we should make the
persistence layer aware of the hierarchy at some point
in time to allow for much more fine grained locks on
a journal basis. Also I think that we could look into
ways how a cluster node can indicate or determine what
updates in a cluster need to be dispatched to the cluster
node. In order to allow scalability in terms of writes
to the repository I think it is important that the cluster
master does not need to actively write the payload of
transactions but only orchestrate them.

Of course I also think that based on the experience with
with the current persistence model we need to make sure
that we deliver a scalable solution for all aspects
of the JCR api where it employs RangeIterators. This
includes lists of childnodes, references and the likes.

I would like to find out if we can take an iterative
evolutionary approach to a more efficient and more
scalable persistence.
As next steps I would like to propose that we build
an option that allows for an index of the cluster that
allows us build a journal backed persistence manager
using the current PM interface, which would essentially
have a no-op for writes.
In addition to that as a next step I would like to propose
that we have the change log operating directly to the journal
as well. I would call this "journal centric" persistence.

I think this could give us a good indication on how much
performance gain we can get out of making sure that
information is only persisted once (ignoring the query
index for now) and it should allow us to test a purely
journal based persistence and then take it from there to evolve
into a more mvcc based and more freely scalable architecture.


View raw message