jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig <mdue...@apache.org>
Subject Re: [jr3] Tree model
Date Wed, 29 Feb 2012 01:04:50 GMT


On 28.2.12 21:26, Dominique Pfister wrote:
> Hi Michael,
>
> On Feb 28, 2012, at 19:11, "Michael Dürig"<mduerig@apache.org>  wrote:
>
>>
>> Hi Jukka,
>>
>> Thanks for bringing this up. In a nutshell what you are proposing is to
>> implement trees as persistent data structures.
>
> Hm, that's rather wishful thinking. Right now, we enforce an MVCC model inside the MicroKernel
- which might be overthrown by some future decision, of course - so revisions and nodes are
immutable per se. I don't see any need to introduce semantical restrictions that are already
mandated by the architecture.

I don't think there is any semantical restriction at all here. Using 
persistent data structures [1] does not restrict us in any way, but 
rather provide us with much nicer properties regarding concurrency.

Michael

[1] http://en.wikipedia.org/wiki/Persistent_data_structure

>
> Dominique
>
>> I think this approach is
>> very valuable and will save us a lot of troubles in the long run. In
>> particular when it comes to concurrency issues explicitly handling
>> mutual access to stateful objects has proofed very troublesome. With an
>> approach like the one you are proposing we get concurrency virtually for
>> free. To that end, I'd even go further and remove that last bit of
>> mutability you mention in the Javadoc. For constructing new trees I'd
>> rather provide factories which can produce new trees from existing
>> trees. Furthermore I'd rather not extend from Map to further emphasis
>> the immutability aspect. Instead I'd provide a map view. Either by
>> providing an asMap() method or by means of an adapter class.
>>
>> Anyway, I'm sure having a minimal and simple interface for such an
>> important key concept will be of great value.
>>
>> Michael
>>
>>
>>
>> On 28.2.12 14:59, Jukka Zitting wrote:
>>> Hi,
>>>
>>> [Here's me looking at lower-level details for a change.]
>>>
>>> I was going through the prototype codebases in the sandbox, trying to
>>> come up with some clean and simple lowest-common-denominator -style
>>> interface for representing content trees. Basically a replacement for
>>> the ItemState model in current Jackrabbit.
>>>
>>> The trouble I find with both the current Jackrabbit ItemState model
>>> and the efforts in the sandbox is that this key concept is modeled as
>>> concrete classes instead of as interfaces. Using an interface to
>>> describe and document the fundamental tree model gives us a lot of
>>> extra freedom on the implementation side (lazy loading, decoration,
>>> virtual content, etc.).
>>>
>>> So, how should we go about constructing such an interface? I started
>>> by laying some ground rules based on concepts from the sandbox and
>>> past jr3 discussions:
>>>
>>>    * A content tree is composed of a hierachy of items
>>>    * Tree items are either leaves or non-leaves
>>>    * Non-leaves contain zero or more named child items (*all* other
>>> data is stored at leaves)
>>>    * Each child item is *uniquely* named within its parent
>>>    * Names are just opaque strings
>>>    * Leaves contain typed data (strings, numbers, binaries, etc.)
>>>    * Content trees are *immutable* except in specific circumstances
>>> (transient changes)
>>>
>>> As a corollary of such a proposed design, the following features (that
>>> with a different tree model could be a part of the underlying storage
>>> model) would need to be handled as higher level constructs:
>>>
>>>    * Same-name-siblings (perhaps by name-mangling)
>>>    * Namespaces and other name semantics
>>>    * Ordering of child nodes (perhaps with a custom order property)
>>>    * Path handling
>>>    * Identifiers and references
>>>    * Node types
>>>    * Versioning
>>>    * etc., etc.
>>>
>>> As you can see, it's a very low-level interface I'm talking about.
>>> With that background, here's what I'm proposing:
>>> https://gist.github.com/1932695 (also included as text at the end of
>>> this message). Note that this proposal covers only the interfaces for
>>> accessing content (with a big TODO in the Leaf interface). A separate
>>> builder or factory interface will be needed for content changes in
>>> case this design is adopted.
>>>
>>> Please criticize, as this is just a quick draft and I'm likely to miss
>>> something fairly important. I'm hoping to evolve this to something we
>>> could use as a well-documented and thought-of internal abstraction for
>>> jr3. Or, if this idea is too broken, to provoke someone to provide a
>>> good counter-proposal. :-)
>>>
>>> BR,
>>>
>>> Jukka Zitting
>>>
>>> ----
>>>
>>> import java.util.Map;
>>>
>>> /**
>>>   * Trees are the key concept in a hierarchical content repository.
>>>   * This interface is a low-level tree node representation that just
>>>   * maps zero or more string names to corresponding child nodes.
>>>   * Depending on context, a Tree instance can be interpreted as
>>>   * representing just that tree node, the subtree starting at that node,
>>>   * or an entire tree in case it's a root node.
>>>   *<p>
>>>   * For familiarity and easy integration with existing libraries this
>>>   * interface extends the generic {@link Map} interface instead of
>>>   * providing a custom alternative. Note also that this interface is
>>>   * named Tree instead of something like Item or Node to avoid confusion
>>>   * with the related JCR interfaces.
>>>   *</p>
>>>   *
>>>   *<h2>Leaves and non-leaves</h2>
>>>   *<p>
>>>   * Normal tree nodes only contain structural information expressed as
>>>   * the set of child nodes and their names. The content of a tree, expressed
>>>   * in data types like strings, numbers and binaries, is stored in special
>>>   * leaf nodes with no children. Such leaf nodes implement the {@link Leaf}
>>>   * sub-interface and can be identified and accessed using the
>>>   * {@link #isLeaf()} and {@link #asLeaf()} methods.
>>>   *</p>
>>>   *<p>
>>>   * Note that even tough such leaf nodes are guaranteed to have no children
>>>   * (i.e. {@link #isLeaf()} implies {@link #isEmpty()}), the reverse is not
>>>   * necessarily true. It's possible for a non-leaf node to contain no children,
>>>   * though such cases occur normally only transiently when new subtrees are
>>>   * being constructed.
>>>   *</p>
>>>   *
>>>   *<h2>Mutability and thread-safety</h2>
>>>   *<p>
>>>   * Tree objects are immutable by default and thus safe for concurrent access.
>>>   * Using a mutator method like {@link #clear()} or {@link #put(String, Tree)}
>>>   * results in an {@link UnsupportedOperationException exception}. A new Tree
>>>   * instance is needed to express a modified content tree. As a result it's
>>>   * safe to repeat operations like iterating over all child nodes of a Tree
>>>   * instance and expect results to be the same.
>>>   *</p>
>>>   *<p>
>>>   * In specific situations like when constructing new trees it's possible for
>>>   * Tree instances to be mutable. Such cases need to be explicitly documented
>>>   * and managed in a way that prevents thread-safety issues, for example by
>>>   * keeping a reference to such a mutable Tree instance local to a single
>>>   * thread.
>>>   *</p>
>>>   *
>>>   *<h2>Persistence and error-handling</h2>
>>>   *<p>
>>>   * A Tree instance can be (and often is) backed by local files or network
>>>   * resources. All IO operations or related concerns like caching should be
>>>   * handled transparently below this interface. Potential IO problems and
>>>   * recovery attempts like retrying a timed-out network access need to be
>>>   * handled below this interface, and only hard errors should be thrown up
>>>   * as {@link RuntimeException unchecked exceptions} that higher level code
>>>   * is not expected to be able to recover from.
>>>   *</p>
>>>   *<p>
>>>   * Since this interface exposes no higher level constructs like access
>>>   * controls, locking, node types or even path parsing, there's no way
>>>   * for content access to fail because of such concerns. Such functionality
>>>   * and related checked exceptions or other control flow constructs should
>>>   * be implemented on a higher level above this interface.
>>>   *</p>
>>>   *
>>>   *<h2>Decoration and virtual content</h2>
>>>   *<p>
>>>   * Not all content exposed by Tree objects needs to be backed by actual
>>>   * persisted data. An implementation may want to provide provide derived
>>>   * data like for example the aggregate size of the entire subtree as an
>>>   * extra virtual leaf node. A virtualization, sharding or caching layer
>>>   * could provide a composite view over multiple underlying content trees.
>>>   * Or a basic access control layer could decide to hide certain content
>>>   * based on specific rules. All such features need to be implemented
>>>   * according to the API contract of this interface. A separate higher level
>>>   * interface needs to be used if an implementation can't for example
>>>   * guarantee immutability of exposed content as discussed above.
>>>   *</p>
>>>   */
>>> interface Tree extends Map<String, Tree>   {
>>>
>>>      /**
>>>       * Checks whether this is a {@link Leaf} instance. Can be used to
>>>       * control program flow without explicit<code>instanceof</code>
  checks
>>>       * for handling leaf content. See also the {@link #asLeaf()} method
>>>       * that can additionally take care of type casting.
>>>       *
>>>       * @return<code>true</code>   if this is a {@link Leaf},
>>>       *<code>false</code>   if not
>>>       */
>>>      boolean isLeaf();
>>>
>>>      /**
>>>       * Returns this instance as a {@link Leaf} if possible. Can be used
>>>       * to access leaf nodes without<code>instanceof</code>   checks
or
>>>       * explicit type casting. A typical access pattern is:
>>>       *<pre>
>>>       * Leaf leaf = tree.asLeaf();
>>>       * if (leaf != null) {
>>>       *     // handle leaf content
>>>       * } else {
>>>       *     // handle non-leaf content
>>>       * }
>>>       *</pre>
>>>       *
>>>       * @return this instance as a {@link Leaf},
>>>       *         or<code>null</code>   if this is a non-leaf node
>>>       */
>>>      Leaf asLeaf();
>>>
>>> }
>>>
>>> /**
>>>   * Leaves are special {@link Tree} nodes contain typed data like strings,
>>>   * numbers, binaries, etc. This interface extends {@link Tree} and thus
>>>   * also {@link Map}, but all Leaf instances are guaranteed to contain zero
>>>   * child nodes. Leaves are always immutable.
>>>   */
>>> interface Leaf extends Tree {
>>>
>>>      // TODO: Add data access methods
>>>
>>> }


Mime
View raw message