jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: [jr3] Tree model
Date Thu, 01 Mar 2012 19:13:34 GMT

It looks my initial attempt at this didn't work too well, as my
intention wasn't clear enough and the interface draft I included
seemed to raise mostly concerns about technicalities (too many
methods, etc.) instead of the fundamental design tradeoffs I was
trying to highlight. So let's try this again.

What I'm looking for is a clear, shared idea of what a jr3 content
tree looks like at a low level (i.e. before stuff like node types,
etc.) since the current MK interface leaves many of those details
unspecified. Here's what the MK interface currently says about this:

>  * The MicroKernel <b>Data Model</b>:
>  * <ul>
>  * <li>simple JSON-inspired data model: just nodes and properties</li>
>  * <li>a node is represented as an object, consisting of an unordered collection
>  * of properties and an array of (child) node objects</li>
>  * <li>properties are represented as name/value pairs</li>
>  * <li>MVPs are represented as name/array-of-values pairs</li>
>  * <li>supported property types: string, number</li>
>  * <li>other property types (weak/hard reference, date, etc) would need to be
>  * encoded/mangled in name or value</li>
>  * <li>no support for JCR/XML-like namespaces, "foo:bar" is just an ordinary name</li>
>  * <li>properties and child nodes share the same namespace, i.e. a property and
>  * a child node, sharing the same parent node, cannot have the same name</li>
>  * </ul>

There are a few complications and missing details with this model (as
documented) that I tried to address in my original proposal. The most
notable are:

* The data model specifies that a node contains an "an array of
(child) node objects" and seems to imply that child nodes are always
orderable. This is a major design constraint for the underlying
storage model that doesn't seem necessary (a higher-level component
could store ordering information explicitly) or desirable (see past
discussions on this). To avoid this I think child nodes should be
treated as an unordered set of name/node mappings.

* Another unspecified bit is whether same-name-siblings need to be
supported on the storage level. The MK implies that SNSs are not
supported (i.e. a higher level component needs to use things like name
mangling to implement SNSs on top of the MK), but the note about "an
*array* of (child) node objects" kind of leaves the door open for two
child nodes to (perhaps accidentally) have the same name. For also
this reason I think child nodes should be treated as a map from names
to corresponding nodes.

* The data model doesn't specify whether the name of a node is an
integral part of the node itself. The implementation(s) clarify (IMHO
correctly) that the name of each child node is more logically a part
of the parent node. Thus, unlike in JCR, there should be no getName()
method on a low-level interface for nodes.

* Somewhat contrary to the above, the data model specifies properties
as "name/value pairs". The MK interface doesn't allow individual
properties to be accessed separately, so this detail doesn't show up
too much in practice. However, in terms of an internal API it would be
useful to keep properties mostly analogous to child nodes. Thus there
should be no getName() method on a low-level interface for properties
(or, perhaps more accurately, "values").

* The data model says that "properties and child nodes share the same
namespace" but treats properties and child nodes differently in other
aspects (properties as "an unordered collection", child nodes as "an
array"). This seems like an unnecessary complication that's likely to
cause trouble down the line (e.g. where and how will we enforce this
constraint?). From an API point of view it would be cleanest either to
treat both properties and child nodes equally (like having all as
parts of a single unordered set of name/item mappings) or to allow and
use a completely separate spaces for property and child node names.

* Finally, while the MK interface doesn't spell it out explicitly, the
implicit consequence of using MVCC and referencing revision
identifiers in method calls is that the underlying tree model is
essentially immutable. The content tree only changes when a new
revision is constructed, while all past revisions remain intact. To
reflect this, an internal tree API should be mostly immutable.

These are in my mind the key issues that I think we should try to
reach an agreement on. The exact form of the interface that expresses
such consensus is IMHO of lesser importance, which is why I don't feel
too strongly about things like the use of java.util.Map or the Visitor
pattern. Such details can be changed down the line based on
experience, but deeper features like addressing and the orderability
of nodes and properties are very expensive to change later on.

My proposal, as drafted in the Tree interface, essentially says:

1) Properties and child nodes are all addressed using an unordered
name->item mapping on the parent node.
2) Neither properties nor child nodes know their own name (or their
parent). That information is kept only within the parent node.
3) Content trees are immutable except in clearly documented cases.

Some concerns about especially the first and third items were raised
in the followup discussion. Based on those concerns, a possible
alternative for the first item could be:

1a) Properties are addressed using an unordered name->property mapping
on the parent node
1b) Child nodes are addressed using an unordered name->node mapping on
the parent node
1c) The spaces for property and child node names are distinct.
Possible restrictions on this need to be implemented on a higher

An alternative for the third item could be:

3a) Content trees are always immutable.
3b) A separate builder API is used to constructing new or modified
content trees.

Can we reach consensus on some of these models (or yet another
alternative)? If yes, it should be fairly straightforward to draft an
interface that captures such consensus and addresses the more detailed
concerns people have expressed.


Jukka Zitting

View raw message