jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix Meschberger <fmesc...@adobe.com>
Subject Re: [jr3] Tree model
Date Wed, 29 Feb 2012 07:27:08 GMT
Hi,

Am 28.02.2012 um 16:11 schrieb Thomas Mueller:

> Hi,
> 
>> import java.util.Map;
>> ... extends Map<String, Tree>
> 
> For a low level interface, that would be a lot of methods to be
> implemented, with special semantics that might not be desirable (for
> example put returns the old data).

I kind of like the idea of a Tree being a Map. In fact it is so IMHO.

Regards
Felix

> 
> Regards,
> Thomas
> 
> 
> 
> 
> 
> On 2/28/12 4:08 PM, "Thomas Mueller" <mueller@adobe.com> wrote:
> 
>> Hi,
>> 
>> What is your use case, meaning where would this interface be used and why?
>> 
>> Regards,
>> Thomas
>> 
>> 
>> On 2/28/12 3:59 PM, "Jukka Zitting" <jukka.zitting@gmail.com> wrote:
>> 
>>> Hi,
>>> 
>>> [Here's me looking at lower-level details for a change.]
>>> 
>>> I was going through the prototype codebases in the sandbox, trying to
>>> come up with some clean and simple lowest-common-denominator -style
>>> interface for representing content trees. Basically a replacement for
>>> the ItemState model in current Jackrabbit.
>>> 
>>> The trouble I find with both the current Jackrabbit ItemState model
>>> and the efforts in the sandbox is that this key concept is modeled as
>>> concrete classes instead of as interfaces. Using an interface to
>>> describe and document the fundamental tree model gives us a lot of
>>> extra freedom on the implementation side (lazy loading, decoration,
>>> virtual content, etc.).
>>> 
>>> So, how should we go about constructing such an interface? I started
>>> by laying some ground rules based on concepts from the sandbox and
>>> past jr3 discussions:
>>> 
>>> * A content tree is composed of a hierachy of items
>>> * Tree items are either leaves or non-leaves
>>> * Non-leaves contain zero or more named child items (*all* other
>>> data is stored at leaves)
>>> * Each child item is *uniquely* named within its parent
>>> * Names are just opaque strings
>>> * Leaves contain typed data (strings, numbers, binaries, etc.)
>>> * Content trees are *immutable* except in specific circumstances
>>> (transient changes)
>>> 
>>> As a corollary of such a proposed design, the following features (that
>>> with a different tree model could be a part of the underlying storage
>>> model) would need to be handled as higher level constructs:
>>> 
>>> * Same-name-siblings (perhaps by name-mangling)
>>> * Namespaces and other name semantics
>>> * Ordering of child nodes (perhaps with a custom order property)
>>> * Path handling
>>> * Identifiers and references
>>> * Node types
>>> * Versioning
>>> * etc., etc.
>>> 
>>> As you can see, it's a very low-level interface I'm talking about.
>>> With that background, here's what I'm proposing:
>>> https://gist.github.com/1932695 (also included as text at the end of
>>> this message). Note that this proposal covers only the interfaces for
>>> accessing content (with a big TODO in the Leaf interface). A separate
>>> builder or factory interface will be needed for content changes in
>>> case this design is adopted.
>>> 
>>> Please criticize, as this is just a quick draft and I'm likely to miss
>>> something fairly important. I'm hoping to evolve this to something we
>>> could use as a well-documented and thought-of internal abstraction for
>>> jr3. Or, if this idea is too broken, to provoke someone to provide a
>>> good counter-proposal. :-)
>>> 
>>> BR,
>>> 
>>> Jukka Zitting
>>> 
>>> ----
>>> 
>>> import java.util.Map;
>>> 
>>> /**
>>> * Trees are the key concept in a hierarchical content repository.
>>> * This interface is a low-level tree node representation that just
>>> * maps zero or more string names to corresponding child nodes.
>>> * Depending on context, a Tree instance can be interpreted as
>>> * representing just that tree node, the subtree starting at that node,
>>> * or an entire tree in case it's a root node.
>>> * <p>
>>> * For familiarity and easy integration with existing libraries this
>>> * interface extends the generic {@link Map} interface instead of
>>> * providing a custom alternative. Note also that this interface is
>>> * named Tree instead of something like Item or Node to avoid confusion
>>> * with the related JCR interfaces.
>>> * </p>
>>> *
>>> * <h2>Leaves and non-leaves</h2>
>>> * <p>
>>> * Normal tree nodes only contain structural information expressed as
>>> * the set of child nodes and their names. The content of a tree,
>>> expressed
>>> * in data types like strings, numbers and binaries, is stored in special
>>> * leaf nodes with no children. Such leaf nodes implement the {@link
>>> Leaf}
>>> * sub-interface and can be identified and accessed using the
>>> * {@link #isLeaf()} and {@link #asLeaf()} methods.
>>> * </p>
>>> * <p>
>>> * Note that even tough such leaf nodes are guaranteed to have no
>>> children
>>> * (i.e. {@link #isLeaf()} implies {@link #isEmpty()}), the reverse is
>>> not
>>> * necessarily true. It's possible for a non-leaf node to contain no
>>> children,
>>> * though such cases occur normally only transiently when new subtrees
>>> are
>>> * being constructed.
>>> * </p>
>>> *
>>> * <h2>Mutability and thread-safety</h2>
>>> * <p>
>>> * Tree objects are immutable by default and thus safe for concurrent
>>> access.
>>> * Using a mutator method like {@link #clear()} or {@link #put(String,
>>> Tree)}
>>> * results in an {@link UnsupportedOperationException exception}. A new
>>> Tree
>>> * instance is needed to express a modified content tree. As a result
>>> it's
>>> * safe to repeat operations like iterating over all child nodes of a
>>> Tree
>>> * instance and expect results to be the same.
>>> * </p>
>>> * <p>
>>> * In specific situations like when constructing new trees it's possible
>>> for
>>> * Tree instances to be mutable. Such cases need to be explicitly
>>> documented
>>> * and managed in a way that prevents thread-safety issues, for example
>>> by
>>> * keeping a reference to such a mutable Tree instance local to a single
>>> * thread.
>>> * </p>
>>> *
>>> * <h2>Persistence and error-handling</h2>
>>> * <p>
>>> * A Tree instance can be (and often is) backed by local files or network
>>> * resources. All IO operations or related concerns like caching should
>>> be
>>> * handled transparently below this interface. Potential IO problems and
>>> * recovery attempts like retrying a timed-out network access need to be
>>> * handled below this interface, and only hard errors should be thrown up
>>> * as {@link RuntimeException unchecked exceptions} that higher level
>>> code
>>> * is not expected to be able to recover from.
>>> * </p>
>>> * <p>
>>> * Since this interface exposes no higher level constructs like access
>>> * controls, locking, node types or even path parsing, there's no way
>>> * for content access to fail because of such concerns. Such
>>> functionality
>>> * and related checked exceptions or other control flow constructs should
>>> * be implemented on a higher level above this interface.
>>> * </p>
>>> *
>>> * <h2>Decoration and virtual content</h2>
>>> * <p>
>>> * Not all content exposed by Tree objects needs to be backed by actual
>>> * persisted data. An implementation may want to provide provide derived
>>> * data like for example the aggregate size of the entire subtree as an
>>> * extra virtual leaf node. A virtualization, sharding or caching layer
>>> * could provide a composite view over multiple underlying content trees.
>>> * Or a basic access control layer could decide to hide certain content
>>> * based on specific rules. All such features need to be implemented
>>> * according to the API contract of this interface. A separate higher
>>> level
>>> * interface needs to be used if an implementation can't for example
>>> * guarantee immutability of exposed content as discussed above.
>>> * </p>
>>> */
>>> interface Tree extends Map<String, Tree> {
>>> 
>>>   /**
>>>    * Checks whether this is a {@link Leaf} instance. Can be used to
>>>    * control program flow without explicit <code>instanceof</code>
>>> checks
>>>    * for handling leaf content. See also the {@link #asLeaf()} method
>>>    * that can additionally take care of type casting.
>>>    *
>>>    * @return <code>true</code> if this is a {@link Leaf},
>>>    *         <code>false</code> if not
>>>    */
>>>   boolean isLeaf();
>>> 
>>>   /**
>>>    * Returns this instance as a {@link Leaf} if possible. Can be used
>>>    * to access leaf nodes without <code>instanceof</code> checks
or
>>>    * explicit type casting. A typical access pattern is:
>>>    * <pre>
>>>    * Leaf leaf = tree.asLeaf();
>>>    * if (leaf != null) {
>>>    *     // handle leaf content
>>>    * } else {
>>>    *     // handle non-leaf content
>>>    * }
>>>    * </pre>
>>>    *
>>>    * @return this instance as a {@link Leaf},
>>>    *         or <code>null</code> if this is a non-leaf node
>>>    */
>>>   Leaf asLeaf();
>>> 
>>> }
>>> 
>>> /**
>>> * Leaves are special {@link Tree} nodes contain typed data like strings,
>>> * numbers, binaries, etc. This interface extends {@link Tree} and thus
>>> * also {@link Map}, but all Leaf instances are guaranteed to contain
>>> zero
>>> * child nodes. Leaves are always immutable.
>>> */
>>> interface Leaf extends Tree {
>>> 
>>>   // TODO: Add data access methods
>>> 
>>> }
>> 
> 


Mime
View raw message