Mailing-List: contact oak-dev-help@jackrabbit.apache.org; run by ezmlm
Precedence: bulk
Reply-To: oak-dev@jackrabbit.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
Message-ID: <515D6550.5080908@apache.org>
Date: Thu, 4 Apr 2013 12:34:40 +0100
From: =?ISO-8859-1?Q?Michael_D=FCrig?= <mduerig@apache.org>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7;
 rv:17.0) Gecko/20130328 Thunderbird/17.0.5
MIME-Version: 1.0
To: "oak-dev@jackrabbit.apache.org" <oak-dev@jackrabbit.apache.org>
Subject: References, referenceables and referential integrity
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
Content-Transfer-Encoding: 7bit


Hi,

I was looking into how to enforce referential integrity for 
referenceable nodes (https://issues.apache.org/jira/browse/OAK-685,
https://issues.apache.org/jira/browse/OAK-101).

Currently references are implemented through an (unique) query index on 
the uuid property. Resolving references and finding references to a 
referenceable node thus involves doing a query. If we want to enforce 
referential integrity in this design, we'd need access to an up to date 
query index from within the respective commit hook. This could be either 
through a query engine or some other means to access the uuid index 
directly.

Instead of this we could however change the design such that no query 
index is needed to track references. In such a design referenced nodes 
would contain back references to all its referents. A commit hook could 
be employed to keep the back references up to date. Furthermore that 
commit hook could simply enforce referenceable integrity by checking 
whether the set of back references is empty on remove.

However, this design is not enough to ensure uniqueness of uuids and to 
look up nodes by uuid. For this we still need some kind of an index 
structure. So we could roll our own here or reuse query indexes. In the 
latter case the commit hook again needs access to the query index in 
order to do its job of updating back references.

In summary the options are:

a) Build our own ad-hoc index structure for uuid uniqueness and lookup. 
Use back references to find referring nodes and to enforce referential 
integrity.

b) Use query indexes for uuid uniqueness and look up and for enforcing 
referential integrity in a commit hook and for finding referring nodes.

c) Use query indexes for uuid uniqueness and look up and for enforcing 
referential integrity in a commit hook. Use back references to find 
referring nodes. In this scenario the commit hook still needs access to 
the query index in order to be able to properly update the back references.

I'm not in favour of c) since it adds complexity from both worlds and I 
don't see much added value.

For b), it would be best if we had a way to access query indexes without 
having to go through an actual query.

Finally a) duplicates some of the indexing logic we have already for 
query indexes, but can do that in a way which is optimal for handling 
references.

Implementation wise b) would be least effort and a) is probably the 
leanest, cleanest and meanest solution.

WDYT?

Michael