jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: [jr3] Tree model
Date Sat, 03 Mar 2012 10:51:56 GMT
Hi,

On Fri, Mar 2, 2012 at 5:17 PM, Stefan Guggisberg
<stefan.guggisberg@gmail.com> wrote:
> On Thu, Mar 1, 2012 at 8:13 PM, Jukka Zitting <jukka.zitting@gmail.com> wrote:
>> * The data model specifies that a node contains an "an array of
>> (child) node objects" and seems to imply that child nodes are always
>> orderable. This is a major design constraint for the underlying
>> storage model that doesn't seem necessary (a higher-level component
>> could store ordering information explicitly) or desirable (see past
>> discussions on this). To avoid this I think child nodes should be
>> treated as an unordered set of name/node mappings.
>
> i don't think that it is a major design constraint in general.
> since in a jcr repository a lot of content is expected to be
> ordered (-> nt:unstructured) we should IMO support this in the mk
> and don't delegate this to the upper layer.

I'm OK with that tradeoff. Basically we're saying that child nodes are
both ordered (in some stable order) *and* orderable (in that they can
be explicitly ordered). This means that the MK implementation must
store explicit ordering information for nodes, which can be a bit
costly for large nodes, as you mention below.

> i agree that very 'flat' nodes are a special case.
> how about stating something along the line of:
>
> child nodes are an orderable (implying 'ordered' of course) set
> of name/node mappings. however, if the size of the set
> exceeds a certain (discoverable?) trheshold, it might just
> ordered, but not orderable.

That's a bit vague, but it should be fine if we explain the tradeoff
properly. Perhaps something like the following?

    /**
     * Returns an iterable of the child node entries starting from the
     * given offset and containing the given number of entries. The order
     * of child nodes is normally as specified by the client that created
     * or reordered them.
     * <p>
     * The underlying implementation may for performance reasons decide
     * to automatically reorder the children of large nodes (with more than
     * 1,000 child nodes). In such cases the original ordering of nodes
     * is lost and the nodes may no longer be explicitly orderable, but the
     * children are still ordered in some stable, implementation-dependant
     * order.
     *
     * @param offset start offset from which to return entries;
     *               with <code>0</code> being the offset of the first entry
     * @param length number of entries to return;
     *               use <code>-1</code> to return all remaining entries
     */
    Iterable<ChildNodeEntry> getChildNodeEntries(int offset, int length);

Alternatively, to avoid such vagueness without imposing performance
penalties on large, flat nodes, we could define two different
iteration orders, one for the explicitly specified order and a native,
more efficient one. That would allow efficient tree traversal for
cases like indexing or garbage collection where ordering is not
relevant, while still maintaining explicit ordering for cases where
it's needed (and that are willing to live with a possible performance
hit).

    /**
     * Returns an iterable of the child node entries starting from the
     * given offset and containing the given number of entries.
     * <p>
     * The order of child nodes is by default as specified by the
     * client that created or reordered them, but the caller can also
     * ask the underlying implementation to return nodes in their
     * native order that may be more efficient to iterate over.
     * To request such native ordering, the caller should specify
     * the offset parameter in ones' complement form
     * (i.e. <code>~offset</code>).
     *
     * @param offset start offset from which to return entries;
     *               with <code>0</code> being the offset of the first entry,
     *               and negative offsets interpreted as described above
     * @param length number of entries to return;
     *               use <code>-1</code> to return all remaining entries
     */
    Iterable<ChildNodeEntry> getChildNodeEntries(int offset, int length);

>> * Another unspecified bit is whether same-name-siblings need to be
>> supported on the storage level. The MK implies that SNSs are not
>> supported (i.e. a higher level component needs to use things like name
>> mangling to implement SNSs on top of the MK), but the note about "an
>> *array* of (child) node objects" kind of leaves the door open for two
>> child nodes to (perhaps accidentally) have the same name. For also
>> this reason I think child nodes should be treated as a map from names
>> to corresponding nodes.
>
> agreed, good point.

Note that if we do mandate orderability on child nodes, supporting
SNSs on the MK level becomes much less of an issue as the internal
storage model needs to be (at least some form of) a list instead of
just a map. Anyway, I'm OK with explicitly scoping out SNSs from the
MK level for now, though we may want or need to revisit that decision
depending on how complex implementing SNSs on a higher level turns out
to be.

> properties and child nodes sharing the same namespace is IMO
> an important requirement since we want to be close to the json model.
>
> for the tree abstraction (representing an immutable hierarchy)
> i am fine with separate methods for getting nodes/properties.

OK. Something like this perhaps:

    /**
     * Returns the named property. The name is an opaque string and
     * is not parsed or otherwise interpreted by this method.
     * <p>
     * The namespace of properties and child nodes is shared, so if
     * this method returns a non-<code>null</code> value for a given
     * name, then {@link #getNode(String)} is guaranteed to return
     * <code>null</code> for the same name.
     *
     * @param name name of the property to return
     * @return named property, or <code>null</code> if not found
     */
    Property getProperty(String name);

    /**
     * Returns the named child node. The name is an opaque string and
     * is not parsed or otherwise interpreted by this method.
     * <p>
     * The namespace of properties and child nodes is shared, so if
     * this method returns a non-<code>null</code> value for a given
     * name, then {@link #getProperty(String)} is guaranteed to return
     * <code>null</code> for the same name.
     *
     * @param name name of the child node to return
     * @return named child node, or <code>null</code> if not found
     */
    Node getNode(String name);

Note that I used the terms Node, Property and ChildNodeEntry above
since they are probably the most familiar to people. However, to avoid
confusion with related JCR interfaces and existing Jackrabbit
internals, it might be a good idea to use some other naming pattern.
Following my original Tree draft (which would work nicely with Oak as
the codename ;-), we could for example use terms Tree, Leaf and
Branch, respectively.

BR,

Jukka Zitting

Mime
View raw message