Return-Path: X-Original-To: apmail-jackrabbit-oak-dev-archive@minotaur.apache.org Delivered-To: apmail-jackrabbit-oak-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2EBE6F339 for ; Thu, 4 Apr 2013 11:35:16 +0000 (UTC) Received: (qmail 19927 invoked by uid 500); 4 Apr 2013 11:35:15 -0000 Delivered-To: apmail-jackrabbit-oak-dev-archive@jackrabbit.apache.org Received: (qmail 19786 invoked by uid 500); 4 Apr 2013 11:35:14 -0000 Mailing-List: contact oak-dev-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: oak-dev@jackrabbit.apache.org Delivered-To: mailing list oak-dev@jackrabbit.apache.org Received: (qmail 19760 invoked by uid 99); 4 Apr 2013 11:35:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Apr 2013 11:35:14 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [64.18.1.241] (HELO exprod6og123.obsmtp.com) (64.18.1.241) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Apr 2013 11:35:06 +0000 Received: from outbound-smtp-2.corp.adobe.com ([193.104.215.16]) by exprod6ob123.postini.com ([64.18.5.12]) with SMTP ID DSNKUV1lVHYyAtcIMGJfncYr6UUSegNrioJa@postini.com; Thu, 04 Apr 2013 04:34:45 PDT Received: from inner-relay-4.eur.adobe.com (inner-relay-4b [10.128.4.237]) by outbound-smtp-2.corp.adobe.com (8.12.10/8.12.10) with ESMTP id r34BYh2h012675 for ; Thu, 4 Apr 2013 04:34:43 -0700 (PDT) Received: from nahub02.corp.adobe.com (nahub02.corp.adobe.com [10.8.189.98]) by inner-relay-4.eur.adobe.com (8.12.10/8.12.9) with ESMTP id r34BYgcF029421 for ; Thu, 4 Apr 2013 04:34:43 -0700 (PDT) Received: from eurhub01.eur.adobe.com (10.128.4.30) by nahub02.corp.adobe.com (10.8.189.98) with Microsoft SMTP Server (TLS) id 8.3.298.1; Thu, 4 Apr 2013 04:34:42 -0700 Received: from susi.local (10.136.132.24) by eurhub01.eur.adobe.com (10.128.4.111) with Microsoft SMTP Server id 8.3.298.1; Thu, 4 Apr 2013 12:34:41 +0100 Message-ID: <515D6550.5080908@apache.org> Date: Thu, 4 Apr 2013 12:34:40 +0100 From: =?ISO-8859-1?Q?Michael_D=FCrig?= User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130328 Thunderbird/17.0.5 MIME-Version: 1.0 To: "oak-dev@jackrabbit.apache.org" Subject: References, referenceables and referential integrity Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi, I was looking into how to enforce referential integrity for referenceable nodes (https://issues.apache.org/jira/browse/OAK-685, https://issues.apache.org/jira/browse/OAK-101). Currently references are implemented through an (unique) query index on the uuid property. Resolving references and finding references to a referenceable node thus involves doing a query. If we want to enforce referential integrity in this design, we'd need access to an up to date query index from within the respective commit hook. This could be either through a query engine or some other means to access the uuid index directly. Instead of this we could however change the design such that no query index is needed to track references. In such a design referenced nodes would contain back references to all its referents. A commit hook could be employed to keep the back references up to date. Furthermore that commit hook could simply enforce referenceable integrity by checking whether the set of back references is empty on remove. However, this design is not enough to ensure uniqueness of uuids and to look up nodes by uuid. For this we still need some kind of an index structure. So we could roll our own here or reuse query indexes. In the latter case the commit hook again needs access to the query index in order to do its job of updating back references. In summary the options are: a) Build our own ad-hoc index structure for uuid uniqueness and lookup. Use back references to find referring nodes and to enforce referential integrity. b) Use query indexes for uuid uniqueness and look up and for enforcing referential integrity in a commit hook and for finding referring nodes. c) Use query indexes for uuid uniqueness and look up and for enforcing referential integrity in a commit hook. Use back references to find referring nodes. In this scenario the commit hook still needs access to the query index in order to be able to properly update the back references. I'm not in favour of c) since it adds complexity from both worlds and I don't see much added value. For b), it would be best if we had a way to access query indexes without having to go through an actual query. Finally a) duplicates some of the indexing logic we have already for query indexes, but can do that in a way which is optimal for handling references. Implementation wise b) would be least effort and a) is probably the leanest, cleanest and meanest solution. WDYT? Michael