jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Boston <...@tfd.co.uk>
Subject Re: References, referenceables and referential integrity
Date Thu, 11 Apr 2013 22:21:08 GMT
I've used an approach simular to a) in the past (other projects) which was
efficient and successful at both bi directional RI and optimised queries. I
also found approaches simular to b) and c) problematic both from a write
throughput point of view and from a query point of view. I dont know enough
about the low level details of Oak to really be any help, other than to
say, requiring a query index that is ms uptodate at scale and transactional
is quite hard to achieve without adding overhead to write.
Ian


On 11 April 2013 20:01, Michael Dürig <mduerig@apache.org> wrote:

>
> Hi,
>
> Here is a summary of a quick f2f discussion Jukka, Angela, Tom and I had
> today: since there is no index for finding all references to a node, using
> a query is troublesome here. We should thus update the code such that
> referenced nodes maintain back references to its referrers and use a commit
> hook to keep the set of back reference up to date. These back references
> would then be used to enforce referential integrity of referenceable nodes
> and to implement Node.getReferences() (instead of the inefficient query
> based implementation we have to today).
>
> Michael
>
> On 4.4.13 13:34, Michael Dürig wrote:
>
>>
>> Hi,
>>
>> I was looking into how to enforce referential integrity for
>> referenceable nodes (https://issues.apache.org/**jira/browse/OAK-685<https://issues.apache.org/jira/browse/OAK-685>
>> ,
>> https://issues.apache.org/**jira/browse/OAK-101<https://issues.apache.org/jira/browse/OAK-101>
>> ).
>>
>> Currently references are implemented through an (unique) query index on
>> the uuid property. Resolving references and finding references to a
>> referenceable node thus involves doing a query. If we want to enforce
>> referential integrity in this design, we'd need access to an up to date
>> query index from within the respective commit hook. This could be either
>> through a query engine or some other means to access the uuid index
>> directly.
>>
>> Instead of this we could however change the design such that no query
>> index is needed to track references. In such a design referenced nodes
>> would contain back references to all its referents. A commit hook could
>> be employed to keep the back references up to date. Furthermore that
>> commit hook could simply enforce referenceable integrity by checking
>> whether the set of back references is empty on remove.
>>
>> However, this design is not enough to ensure uniqueness of uuids and to
>> look up nodes by uuid. For this we still need some kind of an index
>> structure. So we could roll our own here or reuse query indexes. In the
>> latter case the commit hook again needs access to the query index in
>> order to do its job of updating back references.
>>
>> In summary the options are:
>>
>> a) Build our own ad-hoc index structure for uuid uniqueness and lookup.
>> Use back references to find referring nodes and to enforce referential
>> integrity.
>>
>> b) Use query indexes for uuid uniqueness and look up and for enforcing
>> referential integrity in a commit hook and for finding referring nodes.
>>
>> c) Use query indexes for uuid uniqueness and look up and for enforcing
>> referential integrity in a commit hook. Use back references to find
>> referring nodes. In this scenario the commit hook still needs access to
>> the query index in order to be able to properly update the back
>> references.
>>
>> I'm not in favour of c) since it adds complexity from both worlds and I
>> don't see much added value.
>>
>> For b), it would be best if we had a way to access query indexes without
>> having to go through an actual query.
>>
>> Finally a) duplicates some of the indexing logic we have already for
>> query indexes, but can do that in a way which is optimal for handling
>> references.
>>
>> Implementation wise b) would be least effort and a) is probably the
>> leanest, cleanest and meanest solution.
>>
>> WDYT?
>>
>> Michael
>>
>>
>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message