jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dominique Pfister <dpfis...@adobe.com>
Subject Re: [jr3] Tree model
Date Tue, 28 Feb 2012 21:26:35 GMT
Hi Michael,

On Feb 28, 2012, at 19:11, "Michael Dürig" <mduerig@apache.org> wrote:

> 
> Hi Jukka,
> 
> Thanks for bringing this up. In a nutshell what you are proposing is to 
> implement trees as persistent data structures.

Hm, that's rather wishful thinking. Right now, we enforce an MVCC model inside the MicroKernel
- which might be overthrown by some future decision, of course - so revisions and nodes are
immutable per se. I don't see any need to introduce semantical restrictions that are already
mandated by the architecture.

Dominique

> I think this approach is 
> very valuable and will save us a lot of troubles in the long run. In 
> particular when it comes to concurrency issues explicitly handling 
> mutual access to stateful objects has proofed very troublesome. With an 
> approach like the one you are proposing we get concurrency virtually for 
> free. To that end, I'd even go further and remove that last bit of 
> mutability you mention in the Javadoc. For constructing new trees I'd 
> rather provide factories which can produce new trees from existing 
> trees. Furthermore I'd rather not extend from Map to further emphasis 
> the immutability aspect. Instead I'd provide a map view. Either by 
> providing an asMap() method or by means of an adapter class.
> 
> Anyway, I'm sure having a minimal and simple interface for such an 
> important key concept will be of great value.
> 
> Michael
> 
> 
> 
> On 28.2.12 14:59, Jukka Zitting wrote:
>> Hi,
>> 
>> [Here's me looking at lower-level details for a change.]
>> 
>> I was going through the prototype codebases in the sandbox, trying to
>> come up with some clean and simple lowest-common-denominator -style
>> interface for representing content trees. Basically a replacement for
>> the ItemState model in current Jackrabbit.
>> 
>> The trouble I find with both the current Jackrabbit ItemState model
>> and the efforts in the sandbox is that this key concept is modeled as
>> concrete classes instead of as interfaces. Using an interface to
>> describe and document the fundamental tree model gives us a lot of
>> extra freedom on the implementation side (lazy loading, decoration,
>> virtual content, etc.).
>> 
>> So, how should we go about constructing such an interface? I started
>> by laying some ground rules based on concepts from the sandbox and
>> past jr3 discussions:
>> 
>>   * A content tree is composed of a hierachy of items
>>   * Tree items are either leaves or non-leaves
>>   * Non-leaves contain zero or more named child items (*all* other
>> data is stored at leaves)
>>   * Each child item is *uniquely* named within its parent
>>   * Names are just opaque strings
>>   * Leaves contain typed data (strings, numbers, binaries, etc.)
>>   * Content trees are *immutable* except in specific circumstances
>> (transient changes)
>> 
>> As a corollary of such a proposed design, the following features (that
>> with a different tree model could be a part of the underlying storage
>> model) would need to be handled as higher level constructs:
>> 
>>   * Same-name-siblings (perhaps by name-mangling)
>>   * Namespaces and other name semantics
>>   * Ordering of child nodes (perhaps with a custom order property)
>>   * Path handling
>>   * Identifiers and references
>>   * Node types
>>   * Versioning
>>   * etc., etc.
>> 
>> As you can see, it's a very low-level interface I'm talking about.
>> With that background, here's what I'm proposing:
>> https://gist.github.com/1932695 (also included as text at the end of
>> this message). Note that this proposal covers only the interfaces for
>> accessing content (with a big TODO in the Leaf interface). A separate
>> builder or factory interface will be needed for content changes in
>> case this design is adopted.
>> 
>> Please criticize, as this is just a quick draft and I'm likely to miss
>> something fairly important. I'm hoping to evolve this to something we
>> could use as a well-documented and thought-of internal abstraction for
>> jr3. Or, if this idea is too broken, to provoke someone to provide a
>> good counter-proposal. :-)
>> 
>> BR,
>> 
>> Jukka Zitting
>> 
>> ----
>> 
>> import java.util.Map;
>> 
>> /**
>>  * Trees are the key concept in a hierarchical content repository.
>>  * This interface is a low-level tree node representation that just
>>  * maps zero or more string names to corresponding child nodes.
>>  * Depending on context, a Tree instance can be interpreted as
>>  * representing just that tree node, the subtree starting at that node,
>>  * or an entire tree in case it's a root node.
>>  *<p>
>>  * For familiarity and easy integration with existing libraries this
>>  * interface extends the generic {@link Map} interface instead of
>>  * providing a custom alternative. Note also that this interface is
>>  * named Tree instead of something like Item or Node to avoid confusion
>>  * with the related JCR interfaces.
>>  *</p>
>>  *
>>  *<h2>Leaves and non-leaves</h2>
>>  *<p>
>>  * Normal tree nodes only contain structural information expressed as
>>  * the set of child nodes and their names. The content of a tree, expressed
>>  * in data types like strings, numbers and binaries, is stored in special
>>  * leaf nodes with no children. Such leaf nodes implement the {@link Leaf}
>>  * sub-interface and can be identified and accessed using the
>>  * {@link #isLeaf()} and {@link #asLeaf()} methods.
>>  *</p>
>>  *<p>
>>  * Note that even tough such leaf nodes are guaranteed to have no children
>>  * (i.e. {@link #isLeaf()} implies {@link #isEmpty()}), the reverse is not
>>  * necessarily true. It's possible for a non-leaf node to contain no children,
>>  * though such cases occur normally only transiently when new subtrees are
>>  * being constructed.
>>  *</p>
>>  *
>>  *<h2>Mutability and thread-safety</h2>
>>  *<p>
>>  * Tree objects are immutable by default and thus safe for concurrent access.
>>  * Using a mutator method like {@link #clear()} or {@link #put(String, Tree)}
>>  * results in an {@link UnsupportedOperationException exception}. A new Tree
>>  * instance is needed to express a modified content tree. As a result it's
>>  * safe to repeat operations like iterating over all child nodes of a Tree
>>  * instance and expect results to be the same.
>>  *</p>
>>  *<p>
>>  * In specific situations like when constructing new trees it's possible for
>>  * Tree instances to be mutable. Such cases need to be explicitly documented
>>  * and managed in a way that prevents thread-safety issues, for example by
>>  * keeping a reference to such a mutable Tree instance local to a single
>>  * thread.
>>  *</p>
>>  *
>>  *<h2>Persistence and error-handling</h2>
>>  *<p>
>>  * A Tree instance can be (and often is) backed by local files or network
>>  * resources. All IO operations or related concerns like caching should be
>>  * handled transparently below this interface. Potential IO problems and
>>  * recovery attempts like retrying a timed-out network access need to be
>>  * handled below this interface, and only hard errors should be thrown up
>>  * as {@link RuntimeException unchecked exceptions} that higher level code
>>  * is not expected to be able to recover from.
>>  *</p>
>>  *<p>
>>  * Since this interface exposes no higher level constructs like access
>>  * controls, locking, node types or even path parsing, there's no way
>>  * for content access to fail because of such concerns. Such functionality
>>  * and related checked exceptions or other control flow constructs should
>>  * be implemented on a higher level above this interface.
>>  *</p>
>>  *
>>  *<h2>Decoration and virtual content</h2>
>>  *<p>
>>  * Not all content exposed by Tree objects needs to be backed by actual
>>  * persisted data. An implementation may want to provide provide derived
>>  * data like for example the aggregate size of the entire subtree as an
>>  * extra virtual leaf node. A virtualization, sharding or caching layer
>>  * could provide a composite view over multiple underlying content trees.
>>  * Or a basic access control layer could decide to hide certain content
>>  * based on specific rules. All such features need to be implemented
>>  * according to the API contract of this interface. A separate higher level
>>  * interface needs to be used if an implementation can't for example
>>  * guarantee immutability of exposed content as discussed above.
>>  *</p>
>>  */
>> interface Tree extends Map<String, Tree>  {
>> 
>>     /**
>>      * Checks whether this is a {@link Leaf} instance. Can be used to
>>      * control program flow without explicit<code>instanceof</code> 
checks
>>      * for handling leaf content. See also the {@link #asLeaf()} method
>>      * that can additionally take care of type casting.
>>      *
>>      * @return<code>true</code>  if this is a {@link Leaf},
>>      *<code>false</code>  if not
>>      */
>>     boolean isLeaf();
>> 
>>     /**
>>      * Returns this instance as a {@link Leaf} if possible. Can be used
>>      * to access leaf nodes without<code>instanceof</code>  checks or
>>      * explicit type casting. A typical access pattern is:
>>      *<pre>
>>      * Leaf leaf = tree.asLeaf();
>>      * if (leaf != null) {
>>      *     // handle leaf content
>>      * } else {
>>      *     // handle non-leaf content
>>      * }
>>      *</pre>
>>      *
>>      * @return this instance as a {@link Leaf},
>>      *         or<code>null</code>  if this is a non-leaf node
>>      */
>>     Leaf asLeaf();
>> 
>> }
>> 
>> /**
>>  * Leaves are special {@link Tree} nodes contain typed data like strings,
>>  * numbers, binaries, etc. This interface extends {@link Tree} and thus
>>  * also {@link Map}, but all Leaf instances are guaranteed to contain zero
>>  * child nodes. Leaves are always immutable.
>>  */
>> interface Leaf extends Tree {
>> 
>>     // TODO: Add data access methods
>> 
>> }
Mime
View raw message