jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Guggisberg <stefan.guggisb...@gmail.com>
Subject Re: [jr3] Tree model
Date Fri, 02 Mar 2012 16:17:13 GMT
On Thu, Mar 1, 2012 at 8:13 PM, Jukka Zitting <jukka.zitting@gmail.com> wrote:
> Hi,
>
> It looks my initial attempt at this didn't work too well, as my
> intention wasn't clear enough and the interface draft I included
> seemed to raise mostly concerns about technicalities (too many
> methods, etc.) instead of the fundamental design tradeoffs I was
> trying to highlight. So let's try this again.
>
> What I'm looking for is a clear, shared idea of what a jr3 content
> tree looks like at a low level (i.e. before stuff like node types,
> etc.) since the current MK interface leaves many of those details
> unspecified. Here's what the MK interface currently says about this:
>
>>  * The MicroKernel <b>Data Model</b>:
>>  * <ul>
>>  * <li>simple JSON-inspired data model: just nodes and properties</li>
>>  * <li>a node is represented as an object, consisting of an unordered collection
>>  * of properties and an array of (child) node objects</li>
>>  * <li>properties are represented as name/value pairs</li>
>>  * <li>MVPs are represented as name/array-of-values pairs</li>
>>  * <li>supported property types: string, number</li>
>>  * <li>other property types (weak/hard reference, date, etc) would need to
be
>>  * encoded/mangled in name or value</li>
>>  * <li>no support for JCR/XML-like namespaces, "foo:bar" is just an ordinary
name</li>
>>  * <li>properties and child nodes share the same namespace, i.e. a property
and
>>  * a child node, sharing the same parent node, cannot have the same name</li>
>>  * </ul>

please note that the above is somehow outdated, e.g. MVPs
are IMO just regular properties where the values are encoded in a single string
(transparent to the mk).

>
> There are a few complications and missing details with this model (as
> documented) that I tried to address in my original proposal. The most
> notable are:

thanks for bringing this up! comments follow inline...

>
> * The data model specifies that a node contains an "an array of
> (child) node objects" and seems to imply that child nodes are always
> orderable. This is a major design constraint for the underlying
> storage model that doesn't seem necessary (a higher-level component
> could store ordering information explicitly) or desirable (see past
> discussions on this). To avoid this I think child nodes should be
> treated as an unordered set of name/node mappings.

i don't think that it is a major design constraint in general.
since in a jcr repository a lot of content is expected to be
ordered (-> nt:unstructured) we should IMO support this in the mk
and don't delegate this to the upper layer.

i agree that very 'flat' nodes are a special case.
how about stating something along the line of:

child nodes are an orderable (implying 'ordered' of course) set
of name/node mappings. however, if the size of the set
exceeds a certain (discoverable?) trheshold, it might just
ordered, but not orderable.

>
> * Another unspecified bit is whether same-name-siblings need to be
> supported on the storage level. The MK implies that SNSs are not
> supported (i.e. a higher level component needs to use things like name
> mangling to implement SNSs on top of the MK), but the note about "an
> *array* of (child) node objects" kind of leaves the door open for two
> child nodes to (perhaps accidentally) have the same name. For also
> this reason I think child nodes should be treated as a map from names
> to corresponding nodes.

agreed, good point.

>
> * The data model doesn't specify whether the name of a node is an
> integral part of the node itself. The implementation(s) clarify (IMHO
> correctly) that the name of each child node is more logically a part
> of the parent node. Thus, unlike in JCR, there should be no getName()
> method on a low-level interface for nodes.

correct.

>
> * Somewhat contrary to the above, the data model specifies properties
> as "name/value pairs". The MK interface doesn't allow individual
> properties to be accessed separately, so this detail doesn't show up
> too much in practice. However, in terms of an internal API it would be
> useful to keep properties mostly analogous to child nodes. Thus there
> should be no getName() method on a low-level interface for properties
> (or, perhaps more accurately, "values").
>
> * The data model says that "properties and child nodes share the same
> namespace" but treats properties and child nodes differently in other
> aspects (properties as "an unordered collection", child nodes as "an
> array"). This seems like an unnecessary complication that's likely to
> cause trouble down the line (e.g. where and how will we enforce this
> constraint?). From an API point of view it would be cleanest either to
> treat both properties and child nodes equally (like having all as
> parts of a single unordered set of name/item mappings) or to allow and
> use a completely separate spaces for property and child node names.

properties and child nodes sharing the same namespace is IMO
an important requirement since we want to be close to the json model.

for the tree abstraction (representing an immutable hierarchy)
i am fine with separate methods for getting nodes/properties.

>
> * Finally, while the MK interface doesn't spell it out explicitly, the
> implicit consequence of using MVCC and referencing revision
> identifiers in method calls is that the underlying tree model is
> essentially immutable. The content tree only changes when a new
> revision is constructed, while all past revisions remain intact. To
> reflect this, an internal tree API should be mostly immutable.

agreed.

>
> These are in my mind the key issues that I think we should try to
> reach an agreement on. The exact form of the interface that expresses
> such consensus is IMHO of lesser importance, which is why I don't feel
> too strongly about things like the use of java.util.Map or the Visitor
> pattern. Such details can be changed down the line based on
> experience, but deeper features like addressing and the orderability
> of nodes and properties are very expensive to change later on.
>
> My proposal, as drafted in the Tree interface, essentially says:
>
> 1) Properties and child nodes are all addressed using an unordered
> name->item mapping on the parent node.
> 2) Neither properties nor child nodes know their own name (or their
> parent). That information is kept only within the parent node.
> 3) Content trees are immutable except in clearly documented cases.
>
> Some concerns about especially the first and third items were raised
> in the followup discussion. Based on those concerns, a possible
> alternative for the first item could be:
>
> 1a) Properties are addressed using an unordered name->property mapping
> on the parent node
> 1b) Child nodes are addressed using an unordered name->node mapping on
> the parent node
> 1c) The spaces for property and child node names are distinct.
> Possible restrictions on this need to be implemented on a higher
> level.
>
> An alternative for the third item could be:
>
> 3a) Content trees are always immutable.
> 3b) A separate builder API is used to constructing new or modified
> content trees.
>

here's my take:

1a) as suggested

=> 1b') Child nodes are addressed using an orderable name->node mapping on
the parent node. If the number of mappings exceeds a certain (discoverable?)
threshold, the mapping may loose 'orderablility' but but retaining a stable
(non-userdefined) order.

=> 1c') properties and child nodes sharing the same namespace

2) as suggested
3a) & 3b) as suggested

cheers
stefan

> Can we reach consensus on some of these models (or yet another
> alternative)? If yes, it should be fairly straightforward to draft an
> interface that captures such consensus and addresses the more detailed
> concerns people have expressed.
>
> BR,
>
> Jukka Zitting

Mime
View raw message