jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Boston <...@tfd.co.uk>
Subject Re: Question: wide hierachies
Date Wed, 17 Oct 2012 23:06:26 GMT
Hi Stefan,
comments inline:

On 18 October 2012 02:46, Stefan Guggisberg <stefan.guggisberg@gmail.com> wrote:
> hi ian,
> On Sun, Oct 14, 2012 at 2:49 AM, Ian Boston <ieb@tfd.co.uk> wrote:
>> Hi,
>> IIUC Oak has the potential to handle wide hierarchies where there are
>> millions of child nodes to a parent and hence support some of the use
>> cases commonly found in Social Media and cloud applications. This
>> question is about the implementation to support that.
>> IIUC from reading the code child nodes are accessed from the parent
>> node, and to avoid the situation in JR2 where child nodes were
>> represented as an array of objects stored with the parent node, child
>> nodes in Oak are represented in a internal tree structure where the
>> width of the tree is controlled to ensure concurrency and performance
>> is maintained. (did I understand that correctly ?).
> in JR2 the list of child node entries (name-id pairs) is stored within the
> parent node. while this is very efficient for small to medium sized child
> node lists (up to a couple of k entries) read and write performance suffers
> when the list grows very large.
> in the current MicroKernel implementation we use an htree based intermediate
> tree structure for storing large child node collections.

phew, thats what I understood.

>> In choosing this approach, was a possible alternative considered where
>> the child node contains a single pointer to its parent, and the child
>> node is retrieved by hashing it's path rather ie  hash-f(path) ===
>> child-node-storage-id, (eg sha1(path) == child-node-storage-id )
>> rather than navigating from the parent.
> the data model of the current MicroKernel implementation is essentially
> just a DAG of node objects. the versioning model is git-like, i.e.
> every change results in a new immutable tree.
> i don't see how child->parent relationships could be implemented with
> this model. IIUC with path-based hashing you'd sacrifice MVCC
> or am i missing something?

Hmm, good point, your right, you cant MVCC the tree structure, only
the data within the tree, since the path -> pointer transformation is
immutable, thats a pity.

Thanks for explaining.

>> In my exposure to use cases for social media and cloud like systems I
>> have observed several things. Listing child nodes from a parent
>> becomes rarer the closer the application moves towards social media.
>> Direct pointer based access derived from a hierarchical url is the
>> main use case. For non hierarchal pointer, searching becomes dominant.
>> Occasionally listing is required but only as a variant of search.
>> Scans or sorted scans as nearly always avoided as being impractical.
>> In social media content, the cardinality of all parents is low making
>> finding children of parents amenable to an inverted index approach. Ie
>> the parent becomes a search keyword term.
>> (the closer to Enterprise Content Management the application is the
>> inverse of the above is true)
>> If I understood all of that correctly, my question:
>> In circumstances where the use cases are more to the social media end
>> of the spectrum, will it be possible to invert the child-parent
>> relationship within Oak to be a pointer from the child with an
>> optional, additional inverted index should finding children of a
>> specific parent be required ?
> WRT the MicroKernel implementation:
> IMO that's unfortunately not possible with the current git-like
> approach (content-based identifiers, DAG of immutable versions).
> cheers
> stefan
>> Apologies if this is already covered and I missed it. The archives at
>> Markmail didn't turn anything up.
>> Thanks
>> Ian

View raw message