jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting" <jukka.zitt...@gmail.com>
Subject Re: NGP: Simple journal and tree structures
Date Wed, 14 Nov 2007 13:41:04 GMT

On Nov 14, 2007 11:21 AM, (Berry) A.W. van Halderen
<b.vanhalderen@hippo.nl> wrote:
> Quering; which would be the layer where the NGP operate?

Optimally I'd put the query indexes into the NGP tree model, for
example as special rep:index properties of a node. This way we could
have per-subtree indexes and be able to keep accessing an older
version (or a branch, like in an uncommitted transaction) of  the tree
structure at full performance while new content is being written to
the head of the repository.

Of course, coming up with an index format that is space-efficient
enough in an append-only mode is still an open question (though
Lucene's segment files do look promising), so I'm not yet sure if the
above vision can really be implemented. That's why I'm prototyping.

> I was previously in the understanding that the NGP would be the storage
> layer which operates below or in the place of the current
> SharedItemStateManager.  With the remark to implement things like
> Node.getNodes() I gather that you want to do away (in time) with the
> set of ItemStateManagers.

Correct. I think the ItemState model, while flexible and proven, is
preventing us to reach a number of performance improvements by
focusing on content at a very granular level (e.g. Node.getNodes() is
a victim of the classical n*SELECT problem much because of this
architecture). It also requires quite complex caching and cache
invalidation logic that makes the implementation hard to follow. I
also don't like the inherent need for locking and synchronization and
the fact that we need to rely on external support for proper

In summary, while I do like and appreciate the current design, I also
think that it's starting to show it's age and that we need to look for
alternatives to reach new performance and scalability levels.

> Apart from this being a total re-write, which would block a lot of progress,

This is why I'm working inside a sandbox and want to come up with
*very* compelling technical arguments and measured performance
improvements before suggesting to bring the code inside
jackrabbit-core. Also, I don't yet know whether the road I'm headed
down will end up in a dead-end, so for now the effort is strictly
limited to prototyping. In any case, even if the NGP model seems
successful in practice, I think it'll be realistic to expect us
changing the core architecture earliest for something like Jackrabbit
3.0 a few years from now.

> I'm also worried that this would tie in the implementation of JCR by
> JackRabbit a lot with how things would be stored.

I'm not too worried about this. Currently the PersistenceManager model
dictates much of the storage model, and in fact I think that changing
this model is *the* key to any major improvements.

Just like the PersistenceManager model essentially forces the storage
layer into a key-value mapping, the NGP model requires an append-only
tree hierarchy. My main assumption is that the latter is a more
efficient and natural model for JCR content trees.

> NGP looking like a sound idea, is not the only method of storage, and I
> would rather see the ability of different storage layers with different
> characteristics.

There's no stopping us having that for NGP as well. Of course the high
level architecture dictates the access patterns and the generic
content structure, but this doesn't mean that the underlying bit
patterns or storage locations need to be the same.


Jukka Zitting

View raw message