jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Guggisberg <stefan.guggisb...@gmail.com>
Subject Re: Identifier- or hash-based access in the MicroKernel
Date Tue, 20 Nov 2012 18:01:54 GMT
hi jukka

On Tue, Nov 20, 2012 at 5:24 PM, Jukka Zitting <jukka.zitting@gmail.com> wrote:
> Hi,
> A lot of functionality in Oak (node states, the diff and hook
> mechanisms, etc.) are based on walking down the tree hierarchy one
> level at a time. To do this, for example to access changes below
> /a/b/c, oak-core will currently request paths /a, /a/b, /a/b/c and so
> on from the underlying MK implementation.
> This would work reasonably well with MK implementations that are
> essentially big hash table that map the full path (and revision) to
> the content at that location. Even then there's some space overhead as
> even tiny nodes (think of an ACL entry) get paired with the full path
> (and revision) of the node. The current MongoMK with its path keys
> works like this, though even there a secondary index is needed for the
> path lookups.
> The approach is less ideal for MK implementations (like the default
> H2-based one) that have to traverse the path when some content is
> accessed. For example, with the above oak-core access pattern, the
> sequence of accessed nodes would be [ a, a, b, a, b, c ], where
> ideally just [ a, b, c ] would suffice. The KernelNodeStore cache in
> oak-core prevents this from being too big an issue, but ideally we'd
> be able to avoid such extra levels of caching.
> To solve that mismatch without impacting the overall architecture too
> much I'd like to propose the following:
> * When requested using the filter argument, the getNodes() call may
> (but is not required to) return special ":hash" or ":id" properties as
> parts of the (possibly otherwise empty) child node objects included in
> the JSON response.
> * When returned by getNodes(), those values can be used by the client
> instead of the normal path argument when requesting the content of
> such child nodes using other getNodes() calls. The MK implementation
> is expected to automatically detect whether a given string argument is
> a path, a hash or an identifier, possibly as simply as looking at
> whether it starts with a slash.
> * Both ":hash" and ":id" values are expected to uniquely identify a
> specific immutable state of a node. The only difference is that the
> inequality of two hashes implies the inequality of the referenced
> nodes (which can be used by oak-core to optimize some operations),
> whereas it's possible for two different ids to refer to nodes with the
> exact same content.
> Such a solution would allow the following sequence
>    getNodes("/") => { "a": {} }
>    getNodes("/a") => { "b": {} }
>    getNodes("/a/b") => { "c": {} }
>    getNodes("/a/b/c") => {}
> to become something like
>    getNodes("/") => { "a": { ":id": "x" } }
>    getNodes("x") => { "b": { :id": "y" } }
>    getNodes("y") => { "c": { :id": "z"} }
>    getNodes("z") => {}
> with x, y and z being some implementation-specific identifiers, like
> ObjectIDs in MongoDB.
> In any case the MK implementation would still be required to support
> access by full path.

makes sense, +1 in general.

some comments:

- returning an :id and/or :hash should be optional, i.e. we shouldn't
  require an implementation to return an :id or :hash for every path
  (an implementation might e.g. want to persist an entire subtree as
  one single persistence entity)
- i suggest we prefix the id/path getNodes parameter value with ':id:'
and ':hash:'
  (or some other scheme) when requesting nodes by hash or identifier
  to avoid a potential ambiguity (an implementation might support
  both access by hash and id).
- do you have a proposal for the suggested MicroKernel API (java doc)


> BR,
> Jukka Zitting

View raw message