jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject [jr3] Tree model
Date Tue, 28 Feb 2012 14:59:23 GMT
Hi,

[Here's me looking at lower-level details for a change.]

I was going through the prototype codebases in the sandbox, trying to
come up with some clean and simple lowest-common-denominator -style
interface for representing content trees. Basically a replacement for
the ItemState model in current Jackrabbit.

The trouble I find with both the current Jackrabbit ItemState model
and the efforts in the sandbox is that this key concept is modeled as
concrete classes instead of as interfaces. Using an interface to
describe and document the fundamental tree model gives us a lot of
extra freedom on the implementation side (lazy loading, decoration,
virtual content, etc.).

So, how should we go about constructing such an interface? I started
by laying some ground rules based on concepts from the sandbox and
past jr3 discussions:

  * A content tree is composed of a hierachy of items
  * Tree items are either leaves or non-leaves
  * Non-leaves contain zero or more named child items (*all* other
data is stored at leaves)
  * Each child item is *uniquely* named within its parent
  * Names are just opaque strings
  * Leaves contain typed data (strings, numbers, binaries, etc.)
  * Content trees are *immutable* except in specific circumstances
(transient changes)

As a corollary of such a proposed design, the following features (that
with a different tree model could be a part of the underlying storage
model) would need to be handled as higher level constructs:

  * Same-name-siblings (perhaps by name-mangling)
  * Namespaces and other name semantics
  * Ordering of child nodes (perhaps with a custom order property)
  * Path handling
  * Identifiers and references
  * Node types
  * Versioning
  * etc., etc.

As you can see, it's a very low-level interface I'm talking about.
With that background, here's what I'm proposing:
https://gist.github.com/1932695 (also included as text at the end of
this message). Note that this proposal covers only the interfaces for
accessing content (with a big TODO in the Leaf interface). A separate
builder or factory interface will be needed for content changes in
case this design is adopted.

Please criticize, as this is just a quick draft and I'm likely to miss
something fairly important. I'm hoping to evolve this to something we
could use as a well-documented and thought-of internal abstraction for
jr3. Or, if this idea is too broken, to provoke someone to provide a
good counter-proposal. :-)

BR,

Jukka Zitting

----

import java.util.Map;

/**
 * Trees are the key concept in a hierarchical content repository.
 * This interface is a low-level tree node representation that just
 * maps zero or more string names to corresponding child nodes.
 * Depending on context, a Tree instance can be interpreted as
 * representing just that tree node, the subtree starting at that node,
 * or an entire tree in case it's a root node.
 * <p>
 * For familiarity and easy integration with existing libraries this
 * interface extends the generic {@link Map} interface instead of
 * providing a custom alternative. Note also that this interface is
 * named Tree instead of something like Item or Node to avoid confusion
 * with the related JCR interfaces.
 * </p>
 *
 * <h2>Leaves and non-leaves</h2>
 * <p>
 * Normal tree nodes only contain structural information expressed as
 * the set of child nodes and their names. The content of a tree, expressed
 * in data types like strings, numbers and binaries, is stored in special
 * leaf nodes with no children. Such leaf nodes implement the {@link Leaf}
 * sub-interface and can be identified and accessed using the
 * {@link #isLeaf()} and {@link #asLeaf()} methods.
 * </p>
 * <p>
 * Note that even tough such leaf nodes are guaranteed to have no children
 * (i.e. {@link #isLeaf()} implies {@link #isEmpty()}), the reverse is not
 * necessarily true. It's possible for a non-leaf node to contain no children,
 * though such cases occur normally only transiently when new subtrees are
 * being constructed.
 * </p>
 *
 * <h2>Mutability and thread-safety</h2>
 * <p>
 * Tree objects are immutable by default and thus safe for concurrent access.
 * Using a mutator method like {@link #clear()} or {@link #put(String, Tree)}
 * results in an {@link UnsupportedOperationException exception}. A new Tree
 * instance is needed to express a modified content tree. As a result it's
 * safe to repeat operations like iterating over all child nodes of a Tree
 * instance and expect results to be the same.
 * </p>
 * <p>
 * In specific situations like when constructing new trees it's possible for
 * Tree instances to be mutable. Such cases need to be explicitly documented
 * and managed in a way that prevents thread-safety issues, for example by
 * keeping a reference to such a mutable Tree instance local to a single
 * thread.
 * </p>
 *
 * <h2>Persistence and error-handling</h2>
 * <p>
 * A Tree instance can be (and often is) backed by local files or network
 * resources. All IO operations or related concerns like caching should be
 * handled transparently below this interface. Potential IO problems and
 * recovery attempts like retrying a timed-out network access need to be
 * handled below this interface, and only hard errors should be thrown up
 * as {@link RuntimeException unchecked exceptions} that higher level code
 * is not expected to be able to recover from.
 * </p>
 * <p>
 * Since this interface exposes no higher level constructs like access
 * controls, locking, node types or even path parsing, there's no way
 * for content access to fail because of such concerns. Such functionality
 * and related checked exceptions or other control flow constructs should
 * be implemented on a higher level above this interface.
 * </p>
 *
 * <h2>Decoration and virtual content</h2>
 * <p>
 * Not all content exposed by Tree objects needs to be backed by actual
 * persisted data. An implementation may want to provide provide derived
 * data like for example the aggregate size of the entire subtree as an
 * extra virtual leaf node. A virtualization, sharding or caching layer
 * could provide a composite view over multiple underlying content trees.
 * Or a basic access control layer could decide to hide certain content
 * based on specific rules. All such features need to be implemented
 * according to the API contract of this interface. A separate higher level
 * interface needs to be used if an implementation can't for example
 * guarantee immutability of exposed content as discussed above.
 * </p>
 */
interface Tree extends Map<String, Tree> {

    /**
     * Checks whether this is a {@link Leaf} instance. Can be used to
     * control program flow without explicit <code>instanceof</code> checks
     * for handling leaf content. See also the {@link #asLeaf()} method
     * that can additionally take care of type casting.
     *
     * @return <code>true</code> if this is a {@link Leaf},
     *         <code>false</code> if not
     */
    boolean isLeaf();

    /**
     * Returns this instance as a {@link Leaf} if possible. Can be used
     * to access leaf nodes without <code>instanceof</code> checks or
     * explicit type casting. A typical access pattern is:
     * <pre>
     * Leaf leaf = tree.asLeaf();
     * if (leaf != null) {
     *     // handle leaf content
     * } else {
     *     // handle non-leaf content
     * }
     * </pre>
     *
     * @return this instance as a {@link Leaf},
     *         or <code>null</code> if this is a non-leaf node
     */
    Leaf asLeaf();

}

/**
 * Leaves are special {@link Tree} nodes contain typed data like strings,
 * numbers, binaries, etc. This interface extends {@link Tree} and thus
 * also {@link Map}, but all Leaf instances are guaranteed to contain zero
 * child nodes. Leaves are always immutable.
 */
interface Leaf extends Tree {

    // TODO: Add data access methods

}

Mime
View raw message