jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig <mdue...@apache.org>
Subject Re: [jr3] Tree model
Date Tue, 28 Feb 2012 18:10:24 GMT

Hi Jukka,

Thanks for bringing this up. In a nutshell what you are proposing is to 
implement trees as persistent data structures. I think this approach is 
very valuable and will save us a lot of troubles in the long run. In 
particular when it comes to concurrency issues explicitly handling 
mutual access to stateful objects has proofed very troublesome. With an 
approach like the one you are proposing we get concurrency virtually for 
free. To that end, I'd even go further and remove that last bit of 
mutability you mention in the Javadoc. For constructing new trees I'd 
rather provide factories which can produce new trees from existing 
trees. Furthermore I'd rather not extend from Map to further emphasis 
the immutability aspect. Instead I'd provide a map view. Either by 
providing an asMap() method or by means of an adapter class.

Anyway, I'm sure having a minimal and simple interface for such an 
important key concept will be of great value.


On 28.2.12 14:59, Jukka Zitting wrote:
> Hi,
> [Here's me looking at lower-level details for a change.]
> I was going through the prototype codebases in the sandbox, trying to
> come up with some clean and simple lowest-common-denominator -style
> interface for representing content trees. Basically a replacement for
> the ItemState model in current Jackrabbit.
> The trouble I find with both the current Jackrabbit ItemState model
> and the efforts in the sandbox is that this key concept is modeled as
> concrete classes instead of as interfaces. Using an interface to
> describe and document the fundamental tree model gives us a lot of
> extra freedom on the implementation side (lazy loading, decoration,
> virtual content, etc.).
> So, how should we go about constructing such an interface? I started
> by laying some ground rules based on concepts from the sandbox and
> past jr3 discussions:
>    * A content tree is composed of a hierachy of items
>    * Tree items are either leaves or non-leaves
>    * Non-leaves contain zero or more named child items (*all* other
> data is stored at leaves)
>    * Each child item is *uniquely* named within its parent
>    * Names are just opaque strings
>    * Leaves contain typed data (strings, numbers, binaries, etc.)
>    * Content trees are *immutable* except in specific circumstances
> (transient changes)
> As a corollary of such a proposed design, the following features (that
> with a different tree model could be a part of the underlying storage
> model) would need to be handled as higher level constructs:
>    * Same-name-siblings (perhaps by name-mangling)
>    * Namespaces and other name semantics
>    * Ordering of child nodes (perhaps with a custom order property)
>    * Path handling
>    * Identifiers and references
>    * Node types
>    * Versioning
>    * etc., etc.
> As you can see, it's a very low-level interface I'm talking about.
> With that background, here's what I'm proposing:
> https://gist.github.com/1932695 (also included as text at the end of
> this message). Note that this proposal covers only the interfaces for
> accessing content (with a big TODO in the Leaf interface). A separate
> builder or factory interface will be needed for content changes in
> case this design is adopted.
> Please criticize, as this is just a quick draft and I'm likely to miss
> something fairly important. I'm hoping to evolve this to something we
> could use as a well-documented and thought-of internal abstraction for
> jr3. Or, if this idea is too broken, to provoke someone to provide a
> good counter-proposal. :-)
> BR,
> Jukka Zitting
> ----
> import java.util.Map;
> /**
>   * Trees are the key concept in a hierarchical content repository.
>   * This interface is a low-level tree node representation that just
>   * maps zero or more string names to corresponding child nodes.
>   * Depending on context, a Tree instance can be interpreted as
>   * representing just that tree node, the subtree starting at that node,
>   * or an entire tree in case it's a root node.
>   *<p>
>   * For familiarity and easy integration with existing libraries this
>   * interface extends the generic {@link Map} interface instead of
>   * providing a custom alternative. Note also that this interface is
>   * named Tree instead of something like Item or Node to avoid confusion
>   * with the related JCR interfaces.
>   *</p>
>   *
>   *<h2>Leaves and non-leaves</h2>
>   *<p>
>   * Normal tree nodes only contain structural information expressed as
>   * the set of child nodes and their names. The content of a tree, expressed
>   * in data types like strings, numbers and binaries, is stored in special
>   * leaf nodes with no children. Such leaf nodes implement the {@link Leaf}
>   * sub-interface and can be identified and accessed using the
>   * {@link #isLeaf()} and {@link #asLeaf()} methods.
>   *</p>
>   *<p>
>   * Note that even tough such leaf nodes are guaranteed to have no children
>   * (i.e. {@link #isLeaf()} implies {@link #isEmpty()}), the reverse is not
>   * necessarily true. It's possible for a non-leaf node to contain no children,
>   * though such cases occur normally only transiently when new subtrees are
>   * being constructed.
>   *</p>
>   *
>   *<h2>Mutability and thread-safety</h2>
>   *<p>
>   * Tree objects are immutable by default and thus safe for concurrent access.
>   * Using a mutator method like {@link #clear()} or {@link #put(String, Tree)}
>   * results in an {@link UnsupportedOperationException exception}. A new Tree
>   * instance is needed to express a modified content tree. As a result it's
>   * safe to repeat operations like iterating over all child nodes of a Tree
>   * instance and expect results to be the same.
>   *</p>
>   *<p>
>   * In specific situations like when constructing new trees it's possible for
>   * Tree instances to be mutable. Such cases need to be explicitly documented
>   * and managed in a way that prevents thread-safety issues, for example by
>   * keeping a reference to such a mutable Tree instance local to a single
>   * thread.
>   *</p>
>   *
>   *<h2>Persistence and error-handling</h2>
>   *<p>
>   * A Tree instance can be (and often is) backed by local files or network
>   * resources. All IO operations or related concerns like caching should be
>   * handled transparently below this interface. Potential IO problems and
>   * recovery attempts like retrying a timed-out network access need to be
>   * handled below this interface, and only hard errors should be thrown up
>   * as {@link RuntimeException unchecked exceptions} that higher level code
>   * is not expected to be able to recover from.
>   *</p>
>   *<p>
>   * Since this interface exposes no higher level constructs like access
>   * controls, locking, node types or even path parsing, there's no way
>   * for content access to fail because of such concerns. Such functionality
>   * and related checked exceptions or other control flow constructs should
>   * be implemented on a higher level above this interface.
>   *</p>
>   *
>   *<h2>Decoration and virtual content</h2>
>   *<p>
>   * Not all content exposed by Tree objects needs to be backed by actual
>   * persisted data. An implementation may want to provide provide derived
>   * data like for example the aggregate size of the entire subtree as an
>   * extra virtual leaf node. A virtualization, sharding or caching layer
>   * could provide a composite view over multiple underlying content trees.
>   * Or a basic access control layer could decide to hide certain content
>   * based on specific rules. All such features need to be implemented
>   * according to the API contract of this interface. A separate higher level
>   * interface needs to be used if an implementation can't for example
>   * guarantee immutability of exposed content as discussed above.
>   *</p>
>   */
> interface Tree extends Map<String, Tree>  {
>      /**
>       * Checks whether this is a {@link Leaf} instance. Can be used to
>       * control program flow without explicit<code>instanceof</code>  checks
>       * for handling leaf content. See also the {@link #asLeaf()} method
>       * that can additionally take care of type casting.
>       *
>       * @return<code>true</code>  if this is a {@link Leaf},
>       *<code>false</code>  if not
>       */
>      boolean isLeaf();
>      /**
>       * Returns this instance as a {@link Leaf} if possible. Can be used
>       * to access leaf nodes without<code>instanceof</code>  checks or
>       * explicit type casting. A typical access pattern is:
>       *<pre>
>       * Leaf leaf = tree.asLeaf();
>       * if (leaf != null) {
>       *     // handle leaf content
>       * } else {
>       *     // handle non-leaf content
>       * }
>       *</pre>
>       *
>       * @return this instance as a {@link Leaf},
>       *         or<code>null</code>  if this is a non-leaf node
>       */
>      Leaf asLeaf();
> }
> /**
>   * Leaves are special {@link Tree} nodes contain typed data like strings,
>   * numbers, binaries, etc. This interface extends {@link Tree} and thus
>   * also {@link Map}, but all Leaf instances are guaranteed to contain zero
>   * child nodes. Leaves are always immutable.
>   */
> interface Leaf extends Tree {
>      // TODO: Add data access methods
> }

View raw message