Return-Path: Delivered-To: apmail-jackrabbit-dev-archive@www.apache.org Received: (qmail 21482 invoked from network); 24 Feb 2010 10:46:05 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 24 Feb 2010 10:46:05 -0000 Received: (qmail 61990 invoked by uid 500); 24 Feb 2010 10:46:04 -0000 Delivered-To: apmail-jackrabbit-dev-archive@jackrabbit.apache.org Received: (qmail 61904 invoked by uid 500); 24 Feb 2010 10:46:04 -0000 Mailing-List: contact dev-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@jackrabbit.apache.org Delivered-To: mailing list dev@jackrabbit.apache.org Received: (qmail 61896 invoked by uid 99); 24 Feb 2010 10:46:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Feb 2010 10:46:04 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tmueller@day.com designates 207.126.148.87 as permitted sender) Received: from [207.126.148.87] (HELO eu3sys201aog101.obsmtp.com) (207.126.148.87) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 24 Feb 2010 10:45:55 +0000 Received: from source ([209.85.218.221]) by eu3sys201aob101.postini.com ([207.126.154.11]) with SMTP ID DSNKS4UDS1tTCgBNAgZuQo9mZj/kWWg/6Wyn@postini.com; Wed, 24 Feb 2010 10:45:35 UTC Received: by bwz21 with SMTP id 21so3978474bwz.17 for ; Wed, 24 Feb 2010 02:45:31 -0800 (PST) MIME-Version: 1.0 Received: by 10.204.16.76 with SMTP id n12mr1817673bka.136.1267008329908; Wed, 24 Feb 2010 02:45:29 -0800 (PST) Date: Wed, 24 Feb 2010 11:45:29 +0100 Message-ID: <91f3b2651002240245w124a1b27w32c8b5bb335ffb40@mail.gmail.com> Subject: [jr3] EventJournal / who merges changes From: =?ISO-8859-1?Q?Thomas_M=FCller?= To: dev Content-Type: text/plain; charset=ISO-8859-1 == Current Behavior == Currently Jackrabbit tries to merge changes when two sessions add/change/remove different properties concurrently on the same node. As far as I understand, Jackrabbit merges changes by looking at the data (baseline, currently stored, and new). The same for child nodes: when two sessions add different child nodes concurrently, both child nodes are added. There are some problems, for example (when using b-tree mechanisms for child nodes) when a session added child nodes that caused the child node list to split, and a second session adds a different child node (possibly causing a different split). For the second session it looks like some child nodes have been removed, and it would add the child node on the wrong (b-tree) level (in the inner node instead in the leave node). I think merging changes is problematic. Trying to derive the logical operation from "diffing" the old and new versions is sometimes very hard. I suggest to merge changes in a different way. == Proposed Solution == When adding/changing/removing a property or node, the logical operation should be recorded on a high level ("this node was added", "this node was moved from here to there", "this property was added"), first in memory, but when there are changes, it needs to be persisted (possibly only temporarily). When committing a transaction (usually Session.save()), the micro-kernel tries to apply the changes. If there was a conflict, the micro-kernel rejects the changes (it doesn't try to merge). The higher level then has to deal with that. One way to deal with conflict resolution is: 1) Reload the current persistent state (undo all changes, load the new data). 2) Replay the logical operations from the (in-memory or persisted) journal. 3) If that fails again, depending on a timeout, go to 1) or fail. What I describe here is how I understand MVCC http://en.wikipedia.org/wiki/Multiversion_concurrency_control - "every object would also have a read timestamp, and if a transaction Ti wanted to write to object P, and the timestamp of that transaction is earlier than the object's read timestamp (TS(Ti) < RTS(P)), the transaction Ti is aborted and restarted." So Jackrabbit would record the 'transaction Ti' on a higher level. If applying the changes fails (in the micro-kernel), Jackrabbit would automatically restart this transaction (up to a timeout). This should also work well in a distributed environment. This case is similar synchronizing databases. == API == Instead of the current API that requires the change log to be in memory, I suggest to use iterators: void store(Iterator newBundles, Iterator events) throws ConcurrentUpdateException The ChangeLog consists of the new node bundles (plus, for each node bundle, the read timestamp). The event list consists of the EventJournal entries. For smaller operations, a session can keep the event journal in memory. For larger operations, the session can use a temporary file, or possibly store the data in a temporary area within the persistence layer (maybe using a different API). If the operation fails, the session would reload all bundles, and re-apply the events stored in his own local event log. Regards, Thomas