Hi, On Fri, Mar 2, 2012 at 5:17 PM, Stefan Guggisberg wrote: > On Thu, Mar 1, 2012 at 8:13 PM, Jukka Zitting wrote: >> * The data model specifies that a node contains an "an array of >> (child) node objects" and seems to imply that child nodes are always >> orderable. This is a major design constraint for the underlying >> storage model that doesn't seem necessary (a higher-level component >> could store ordering information explicitly) or desirable (see past >> discussions on this). To avoid this I think child nodes should be >> treated as an unordered set of name/node mappings. > > i don't think that it is a major design constraint in general. > since in a jcr repository a lot of content is expected to be > ordered (-> nt:unstructured) we should IMO support this in the mk > and don't delegate this to the upper layer. I'm OK with that tradeoff. Basically we're saying that child nodes are both ordered (in some stable order) *and* orderable (in that they can be explicitly ordered). This means that the MK implementation must store explicit ordering information for nodes, which can be a bit costly for large nodes, as you mention below. > i agree that very 'flat' nodes are a special case. > how about stating something along the line of: > > child nodes are an orderable (implying 'ordered' of course) set > of name/node mappings. however, if the size of the set > exceeds a certain (discoverable?) trheshold, it might just > ordered, but not orderable. That's a bit vague, but it should be fine if we explain the tradeoff properly. Perhaps something like the following? /** * Returns an iterable of the child node entries starting from the * given offset and containing the given number of entries. The order * of child nodes is normally as specified by the client that created * or reordered them. *

* The underlying implementation may for performance reasons decide * to automatically reorder the children of large nodes (with more than * 1,000 child nodes). In such cases the original ordering of nodes * is lost and the nodes may no longer be explicitly orderable, but the * children are still ordered in some stable, implementation-dependant * order. * * @param offset start offset from which to return entries; * with 0 being the offset of the first entry * @param length number of entries to return; * use -1 to return all remaining entries */ Iterable getChildNodeEntries(int offset, int length); Alternatively, to avoid such vagueness without imposing performance penalties on large, flat nodes, we could define two different iteration orders, one for the explicitly specified order and a native, more efficient one. That would allow efficient tree traversal for cases like indexing or garbage collection where ordering is not relevant, while still maintaining explicit ordering for cases where it's needed (and that are willing to live with a possible performance hit). /** * Returns an iterable of the child node entries starting from the * given offset and containing the given number of entries. *

* The order of child nodes is by default as specified by the * client that created or reordered them, but the caller can also * ask the underlying implementation to return nodes in their * native order that may be more efficient to iterate over. * To request such native ordering, the caller should specify * the offset parameter in ones' complement form * (i.e. ~offset). * * @param offset start offset from which to return entries; * with 0 being the offset of the first entry, * and negative offsets interpreted as described above * @param length number of entries to return; * use -1 to return all remaining entries */ Iterable getChildNodeEntries(int offset, int length); >> * Another unspecified bit is whether same-name-siblings need to be >> supported on the storage level. The MK implies that SNSs are not >> supported (i.e. a higher level component needs to use things like name >> mangling to implement SNSs on top of the MK), but the note about "an >> *array* of (child) node objects" kind of leaves the door open for two >> child nodes to (perhaps accidentally) have the same name. For also >> this reason I think child nodes should be treated as a map from names >> to corresponding nodes. > > agreed, good point. Note that if we do mandate orderability on child nodes, supporting SNSs on the MK level becomes much less of an issue as the internal storage model needs to be (at least some form of) a list instead of just a map. Anyway, I'm OK with explicitly scoping out SNSs from the MK level for now, though we may want or need to revisit that decision depending on how complex implementing SNSs on a higher level turns out to be. > properties and child nodes sharing the same namespace is IMO > an important requirement since we want to be close to the json model. > > for the tree abstraction (representing an immutable hierarchy) > i am fine with separate methods for getting nodes/properties. OK. Something like this perhaps: /** * Returns the named property. The name is an opaque string and * is not parsed or otherwise interpreted by this method. *

* The namespace of properties and child nodes is shared, so if * this method returns a non-null value for a given * name, then {@link #getNode(String)} is guaranteed to return * null for the same name. * * @param name name of the property to return * @return named property, or null if not found */ Property getProperty(String name); /** * Returns the named child node. The name is an opaque string and * is not parsed or otherwise interpreted by this method. *

* The namespace of properties and child nodes is shared, so if * this method returns a non-null value for a given * name, then {@link #getProperty(String)} is guaranteed to return * null for the same name. * * @param name name of the child node to return * @return named child node, or null if not found */ Node getNode(String name); Note that I used the terms Node, Property and ChildNodeEntry above since they are probably the most familiar to people. However, to avoid confusion with related JCR interfaces and existing Jackrabbit internals, it might be a good idea to use some other naming pattern. Following my original Tree draft (which would work nicely with Oak as the codename ;-), we could for example use terms Tree, Leaf and Branch, respectively. BR, Jukka Zitting